The only crawler that handles web + video

Turn any URL into
structured data for AI

Websites, videos, PDFs — one API call, structured JSON output, RAG-ready chunks. No more messy HTML parsing.

100 requests/day free · No credit card

example.py

from crawlkit import CrawlKit

ck = CrawlKit("your-api-key")

# Any webpage → structured data
page = ck.scrape("https://vnexpress.net/article")
# → {title, author, date, content, chunks[]}

# Same API for video — auto-detects platform
video = ck.scrape("https://youtube.com/watch?v=dQw4w9WgXcQ")
# → {transcript, duration, chapters, chunks[]}

Web data for AI is broken

Every AI builder hits the same walls

😤 Without CrawlKit

✗ JS-rendered pages return empty HTML
✗ Cloudflare blocks your requests
✗ Hours cleaning messy HTML output
✗ RAG chunks are garbage → LLM hallucinates
✗ Video transcripts? Build your own extractor
✗ Each website needs custom parsing logic

✨ With CrawlKit

✓ Auto JS rendering when needed
✓ Anti-bot bypass built in
✓ Clean structured JSON, ready to use
✓ Smart chunks with token counts for RAG
✓ YouTube, TikTok transcripts in 2-3 seconds
✓ 10+ parsers auto-detect content type

🎬 Unique Feature

Video intelligence
no one else has

Firecrawl can't do it. Crawl4AI can't do it. Jina can't do it.
CrawlKit extracts transcripts from YouTube, TikTok, and Facebook Video in seconds.

2-3s per video regardless of length

0 MB no video download — text + metadata only

RAG chunks by timestamp, ready for LLM

video_crawl.py

result = ck.scrape(
    "https://youtube.com/watch?v=abc"
)

# Full transcript
print(result.structured["transcript"])

# Duration, views, chapters
print(result.structured["duration"])  # 1344
print(result.structured["views"])     # 2.4M

# RAG chunks by timestamp
for chunk in result.chunks:
    print(chunk.text, chunk.tokens)

Built for AI pipelines

Everything you need to feed your LLM clean data

🧠

Smart Parsers

10+ domain-specific parsers. News, legal docs, real estate, finance, video — auto-detected.

📦

RAG-Ready Chunks

Smart chunking by content structure — articles by paragraph, legal by clause, video by timestamp.

🎬

Video Transcripts

YouTube, TikTok, Facebook — full transcript extraction in 2-3 seconds. No video download.

🌐

JS Rendering

Auto-detects static vs dynamic pages. Playwright rendering when needed, httpx when not.

🛡️

Anti-Bot Bypass

Cloudflare, rate limiting, CAPTCHAs — handled automatically with retry logic.

🔄

Learning Engine

Adapts to website changes. Learns optimal selectors and extraction patterns over time.

How we compare

	CrawlKit	Firecrawl	Crawl4AI	Jina
Web crawling	✓	✓	✓	✓
Video transcripts	✓	✗	✗	✗
RAG chunks	✓	✓	✓	✗
PDF extraction	✓	✓	✗	✗
Domain parsers	10+	—	—	—
Free tier	100/day	500 credits	OSS	1M tokens

Install in 10 seconds

Python and Node.js SDKs, or just use the REST API

🐍 Python PyPI →

pip install crawlkit

📦 Node.js npm →

npm install paparusi-crawlkit

Simple pricing

Start free. Upgrade when you need more.

Free

$0/mo

✓ 100 requests/day
✓ Basic parsers
✓ Community support

Get Started

Popular

Starter

$19/mo

✓ 10,000 requests/mo
✓ All parsers + video
✓ Email support

Get Started

Pro

$79/mo

✓ 100,000 requests/mo
✓ Custom parsers
✓ Priority support

Get Started

Enterprise

Custom

✓ Unlimited requests
✓ SLA guarantee
✓ On-premise option

Turn any URL into structured data for AI

Web data for AI is broken

😤 Without CrawlKit

✨ With CrawlKit

Video intelligenceno one else has

Built for AI pipelines

Smart Parsers

RAG-Ready Chunks

Video Transcripts

JS Rendering

Anti-Bot Bypass

Learning Engine

How we compare

Install in 10 seconds

Simple pricing

Free

Starter

Pro

Enterprise

Stop parsing HTML.Start building AI.

Turn any URL into
structured data for AI

Video intelligence
no one else has

Stop parsing HTML.
Start building AI.