The only crawler that handles web + video

Turn any URL into
structured data for AI

Websites, videos, PDFs — one API call, structured JSON output, RAG-ready chunks. No more messy HTML parsing.

100 requests/day free · No credit card

example.py
from crawlkit import CrawlKit

ck = CrawlKit("your-api-key")

# Any webpage → structured data
page = ck.scrape("https://vnexpress.net/article")
# → {title, author, date, content, chunks[]}

# Same API for video — auto-detects platform
video = ck.scrape("https://youtube.com/watch?v=dQw4w9WgXcQ")
# → {transcript, duration, chapters, chunks[]}
Used with: YouTube TikTok VnExpress CafeF TVPL GitHub PDF + any URL

Web data for AI is broken

Every AI builder hits the same walls

😤 Without CrawlKit

  • JS-rendered pages return empty HTML
  • Cloudflare blocks your requests
  • Hours cleaning messy HTML output
  • RAG chunks are garbage → LLM hallucinates
  • Video transcripts? Build your own extractor
  • Each website needs custom parsing logic

✨ With CrawlKit

  • Auto JS rendering when needed
  • Anti-bot bypass built in
  • Clean structured JSON, ready to use
  • Smart chunks with token counts for RAG
  • YouTube, TikTok transcripts in 2-3 seconds
  • 10+ parsers auto-detect content type
🎬 Unique Feature

Video intelligence
no one else has

Firecrawl can't do it. Crawl4AI can't do it. Jina can't do it.
CrawlKit extracts transcripts from YouTube, TikTok, and Facebook Video in seconds.

2-3s per video regardless of length
0 MB no video download — text + metadata only
RAG chunks by timestamp, ready for LLM
video_crawl.py
result = ck.scrape(
    "https://youtube.com/watch?v=abc"
)

# Full transcript
print(result.structured["transcript"])

# Duration, views, chapters
print(result.structured["duration"])  # 1344
print(result.structured["views"])     # 2.4M

# RAG chunks by timestamp
for chunk in result.chunks:
    print(chunk.text, chunk.tokens)

Built for AI pipelines

Everything you need to feed your LLM clean data

🧠

Smart Parsers

10+ domain-specific parsers. News, legal docs, real estate, finance, video — auto-detected.

📦

RAG-Ready Chunks

Smart chunking by content structure — articles by paragraph, legal by clause, video by timestamp.

🎬

Video Transcripts

YouTube, TikTok, Facebook — full transcript extraction in 2-3 seconds. No video download.

🌐

JS Rendering

Auto-detects static vs dynamic pages. Playwright rendering when needed, httpx when not.

🛡️

Anti-Bot Bypass

Cloudflare, rate limiting, CAPTCHAs — handled automatically with retry logic.

🔄

Learning Engine

Adapts to website changes. Learns optimal selectors and extraction patterns over time.

How we compare

CrawlKit Firecrawl Crawl4AI Jina
Web crawling
Video transcripts
RAG chunks
PDF extraction
Domain parsers 10+
Free tier 100/day 500 credits OSS 1M tokens

Install in 10 seconds

Python and Node.js SDKs, or just use the REST API

🐍 Python PyPI →
pip install crawlkit
📦 Node.js npm →
npm install paparusi-crawlkit

Simple pricing

Start free. Upgrade when you need more.

Free

$0/mo
  • ✓ 100 requests/day
  • ✓ Basic parsers
  • ✓ Community support
Get Started
Popular

Starter

$19/mo
  • ✓ 10,000 requests/mo
  • ✓ All parsers + video
  • ✓ Email support
Get Started

Pro

$79/mo
  • ✓ 100,000 requests/mo
  • ✓ Custom parsers
  • ✓ Priority support
Get Started

Enterprise

Custom
  • ✓ Unlimited requests
  • ✓ SLA guarantee
  • ✓ On-premise option
Contact Us

Stop parsing HTML.
Start building AI.

Get your free API key and start extracting data in under 2 minutes.

Start Free →