Websites, videos, PDFs — one API call, structured JSON output, RAG-ready chunks. No more messy HTML parsing.
100 requests/day free · No credit card
from crawlkit import CrawlKit
ck = CrawlKit("your-api-key")
# Any webpage → structured data
page = ck.scrape("https://vnexpress.net/article")
# → {title, author, date, content, chunks[]}
# Same API for video — auto-detects platform
video = ck.scrape("https://youtube.com/watch?v=dQw4w9WgXcQ")
# → {transcript, duration, chapters, chunks[]}
Every AI builder hits the same walls
Firecrawl can't do it. Crawl4AI can't do it. Jina can't do it.
CrawlKit extracts transcripts from YouTube, TikTok, and Facebook Video in seconds.
result = ck.scrape(
"https://youtube.com/watch?v=abc"
)
# Full transcript
print(result.structured["transcript"])
# Duration, views, chapters
print(result.structured["duration"]) # 1344
print(result.structured["views"]) # 2.4M
# RAG chunks by timestamp
for chunk in result.chunks:
print(chunk.text, chunk.tokens)
Everything you need to feed your LLM clean data
10+ domain-specific parsers. News, legal docs, real estate, finance, video — auto-detected.
Smart chunking by content structure — articles by paragraph, legal by clause, video by timestamp.
YouTube, TikTok, Facebook — full transcript extraction in 2-3 seconds. No video download.
Auto-detects static vs dynamic pages. Playwright rendering when needed, httpx when not.
Cloudflare, rate limiting, CAPTCHAs — handled automatically with retry logic.
Adapts to website changes. Learns optimal selectors and extraction patterns over time.
| CrawlKit | Firecrawl | Crawl4AI | Jina | |
|---|---|---|---|---|
| Web crawling | ✓ | ✓ | ✓ | ✓ |
| Video transcripts | ✓ | ✗ | ✗ | ✗ |
| RAG chunks | ✓ | ✓ | ✓ | ✗ |
| PDF extraction | ✓ | ✓ | ✗ | ✗ |
| Domain parsers | 10+ | — | — | — |
| Free tier | 100/day | 500 credits | OSS | 1M tokens |
Start free. Upgrade when you need more.
Get your free API key and start extracting data in under 2 minutes.
Start Free →