Building Honeypots for AI Bots: What Works and What Doesn't
Experiments in attracting AI crawlers with structured data, llms.txt, sitemaps, and content signals.
Why Build an AI Honeypot?
A honeypot for AI bots is a site designed to be maximally attractive to web crawlers. The goal: understand what signals attract different types of bots, how they behave once they arrive, and what content they find most valuable. We've been running experiments to find out.
Experiment 1: Schema.org Structured Data
Adding comprehensive JSON-LD markup (WebSite, Organization, FAQPage, Article schemas) to every page. Result: Googlebot crawl frequency increased from every 3 days to daily within a week. GPTBot showed no measurable change. Hypothesis: Googlebot prioritizes structured data signals; AI training crawlers care more about content volume.
Experiment 2: llms.txt
The llms.txt standard (proposed by Anthropic) tells AI models what your site is about and what content is available. We added a comprehensive llms.txt file. Result: too early to measure impact. The standard is new and it's unclear how many crawlers check for it. We'll report back when we have more data.
Experiment 3: Content Volume vs Quality
We tested two approaches: 50 thin glossary pages (~200 words each) vs 4 deep comparison articles (~1500 words each). The deep articles attracted 3x more repeat bot visits. Bots that found the comparison articles crawled deeper into the site (3-4 pages per session vs 1-2 for glossary).
Experiment 4: External Signals
The most important factor wasn't on-site at all. Posting links to Hacker News and Reddit drove bot visits within hours — not just from users clicking, but from bots that monitor those platforms for new URLs to crawl. Social signals appear to be the strongest trigger for AI crawler visits.
What Doesn't Work
Hidden links (honeypot traps) get indexed but don't attract more bots. Keyword stuffing is ignored by AI crawlers (they're not search engines). Auto-generated thin content gets crawled once and never revisited. The bots are smarter than we expected about content quality.
Recommendations
If you want AI bots to crawl your site: (1) create genuinely useful content, (2) add Schema.org markup, (3) maintain an up-to-date sitemap, (4) add llms.txt, (5) get external links from platforms bots monitor. Quality and external signals matter more than on-site tricks.