Self Hosted Scrappers
Web scrappers with AI specific outputs for feeding data into LLMs.
Firecrawl
Official self-hosted
Jina Reader
Hacked self-hosted
- FYI jin.ai reader can now be self hosted · open-webui/open-webui · Discussion #5789 · GitHub
- GitHub - intergalacticalvariable/reader: 📚 This is an adapted version of Jina AI's Reader for local deployment using Docker. Convert any URL to an LLM-friendly input with a simple prefix http://127.0.0.1:3000/https://website-to-scrape.com/
Becareful: do NOT use the official jina.ai version - it is a cloud service.
ScapeGraphAI
Official self-hosted
Anything LLM
Write your own web scrapping code e.g.