2026-05-10|5 min read

Is Your robots.txt Blocking AI? How to Welcome Recommendation Engine Crawlers

Many businesses unknowingly block GPTBot, ClaudeBot, and PerplexityBot in their robots.txt. Here is how to check and fix it.

Your robots.txt file is a small text file at the root of your website that tells web crawlers which pages they are allowed to access. It has been a standard part of SEO for decades. But with the rise of AI recommendation engines, it has taken on a new and critical role.

The Problem: Default Blocking

Many website platforms, security plugins, and hosting providers add broad crawler restrictions to robots.txt by default. A rule like 'User-agent: * Disallow: /' blocks all crawlers, including AI ones. More commonly, businesses specifically block AI crawlers because they heard about copyright concerns or did not understand the implications.

When GPTBot (OpenAI), ClaudeBot (Anthropic), or PerplexityBot cannot access your website, they cannot index your content. This means AI engines cannot learn about your business, cannot verify information from other sources against your site, and are less likely to recommend you.

The AI Crawlers You Should Know About

GPTBot: OpenAI's crawler for ChatGPT. ClaudeBot: Anthropic's crawler for Claude. Google Extended: Google's AI training crawler (separate from Googlebot). PerplexityBot: Perplexity's web indexing crawler. CCBot: Common Crawl's bot, used by many AI training datasets. Bytespider: ByteDance's crawler. cohere-ai: Cohere's training crawler. meta-externalagent: Meta's AI crawler. Applebot Extended: Apple's AI features crawler.

Each of these should be explicitly allowed in your robots.txt if you want maximum AI visibility.

How to Check Your robots.txt

Visit yourdomain.com/robots.txt in a browser. Look for any Disallow rules that apply to the AI crawlers listed above. If you see 'User-agent: GPTBot Disallow: /' or similar rules, your AI visibility is being limited by your own configuration.

Also check for overly broad rules like 'User-agent: * Disallow: /' which block all crawlers including AI ones.

The Fix: A Recommended robots.txt for AI Visibility

Your robots.txt should explicitly welcome each AI crawler while still protecting sensitive pages (admin areas, API routes, user dashboards). Allow the root and public content pages. Disallow only genuinely private paths.

Additionally, consider adding an llms.txt file (following the llmstxt.org specification) that provides AI engines with a structured, machine readable summary of your business. Reference it in your robots.txt with a Llms-Txt directive.

Beyond robots.txt

Fixing your robots.txt is necessary but not sufficient. It removes a barrier but does not actively promote your business to AI engines. For that, you also need structured data, directory presence, entity rich content, and regular AI visibility monitoring. Think of robots.txt as opening the door. Everything else is what makes AI want to walk through it.

Check Your AI Visibility Now

See which recommendation engines can find your business. Free audit, no signup, results in under 4 minutes.

Start Free Audit