CCBot
The Common Crawl bot that builds an open repository of web crawl data used by researchers and AI companies worldwide.
Overview
- Operated By
- Common Crawl
- Purpose
- Builds the largest open web crawl dataset, used extensively for AI model training.
- User-Agent String
- CCBot/2.0
- Respects robots.txt
- Yes
- Category
- ai training
- Website
- https://commoncrawl.org