Common Crawl

CCBot

The Common Crawl bot that builds an open repository of web crawl data used by researchers and AI companies worldwide.

Overview

Operated By
Common Crawl
Purpose
Builds the largest open web crawl dataset, used extensively for AI model training.
User-Agent String
CCBot/2.0
Respects robots.txt
Yes
Category
ai training
Website
https://commoncrawl.org