Skip to content
Technisch · GEO

robots.txt & AI Crawlers: What to Block, What to Allow?

February 2026 ⏱ 8 min. read ️ GEO, LLMO, KI-SEO

Why robots.txt Is Crucial for AI

Your robots.txt is the first file that an AI crawler checks on your website. If there is a Disallow: / for GPTBot, ChatGPT cannot crawl your website – your content will then not be considered for recommendations and answers.

The problem: Many websites block AI crawlers without knowing it. Some hosting providers set blanket blocks, some CMS updates add new rules, and some SEO plugins block "unknown" bots by default. The result: your brand simply does not exist for AI assistants.

Did you know? Over 30% of top websites block at least one AI crawler in their robots.txt – often unintentionally through wildcard rules.

All 13 AI crawlers at a glance

There are now about a dozen AI crawlers that scan websites. Each belongs to a different provider and has different purposes:

CrawlerProviderPurposeRecommendation
GPTBotOpenAITraining + browsing Allow
ChatGPT-UserOpenAILive browsing in chat Allow
Google-ExtendedGoogleGemini training Allow
GooglebotGoogleSuche + AI Overviews Essentiell
anthropic-aiAnthropicClaude training Allow
ClaudeBotAnthropicClaude browsing Allow
PerplexityBotPerplexityReal-time search Allow
Applebot-ExtendedAppleApple Intelligence Allow
Meta-ExternalAgentMetaMeta AI Training Consider
BytespiderByteDanceTikTok AI Consider
CCBotCommon CrawlOpen Archive Consider
cohere-aiCohereEnterprise AI Allow
AmazonbotAmazonAlexa + Shopping Allow

Block or Allow? Decision Guide

Approach 1: Allow All (recommended for Shops)

If you want AI assistants to recommend your products, allow all crawlers. The best approach for online stores and service providers. Every blocked crawler is an AI system that doesn't know your brand.

Approach 2: Selective Access

Allow the most important crawlers (GPTBot, ClaudeBot, PerplexityBot, Googlebot) and block what you don't need – useful if you have server load concerns.

Approach 3: Block Training, Allow Browsing

Allow live browsing (ChatGPT-User, PerplexityBot), block training crawlers (GPTBot, Google-Extended). This way you're visible in real-time answers without your content being used for model training.

⚠️ For shop owners: If you block GPTBot and ClaudeBot, these AI systems cannot recommend your products – even with a perfect llms.txt. The robots.txt takes precedence.

Praxis: robots.txt correctly konfigurieren

Allow all AI crawlers (default)

# Allow all search engines and AI crawlers User-agent: * Allow: / # Nur interne Bereiche sperren Disallow: /admin/ Disallow: /warenkorb/ Disallow: /checkout/ Sitemap: https://www.your-domain.com/sitemap.xml

Block Training, Allow Browsing

# Block training crawlers User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / # Allow browsing crawlers User-agent: ChatGPT-User Allow: / User-agent: PerplexityBot Allow: / User-agent: ClaudeBot Allow: /

5 Common Mistakes in AI Configuration

1. Wildcard blocks everything

User-agent: * / Disallow: / blocks all bots – including AI crawlers. This setup was standard at some hosting providers and is now fatal for AI visibility.

2. Veraltete robots.txt after CMS-Update

Some CMS updates overwrite the robots.txt or add new rules. Check after every update whether your AI crawler rules are intact.

3. Case sensitivity in User-Agent

User-Agent names are case-sensitive. GPTBot is not the same as gptbot. Always use the official spelling.

4. No robots.txt present

No robots.txt is better than a poorly configured one – without the file all crawlers are allowed. But you miss the chance to protect internal areas.

5. CDN/Firewall blocked Bots

Cloudflare, Sucuri and other WAFs can block AI bots at server level before the robots.txt is read. Check the bot management settings.

How to check your robots.txt now

Instead of manually reading the robots.txt and checking 13 crawlers one by one, use our free tool:

robots.txt AI-Crawler Check

Checks in seconds which of the 13 AI crawlers can crawl your website – with visual status display and concrete recommendations.

Free check now →

No registration required · results in seconds

Zusammenspiel: robots.txt + llms.txt

robots.txt and llms.txt work together: robots.txt controls access – who can crawl? llms.txt provides the content – what should AI know about you? Without allowed access, the best llms.txt is useless.

The optimal 5-step workflow:

  1. robots.txt check – Allow AI crawlers (→ Checker)
  2. AI visibility messen – Wo stehen Sie? (→ Visibility Check)
  3. Schema.org check – Structured data completely? (→ Schema Checker)
  4. Generate llms.txt – create AI-optimized files (→ Generator)
  5. Validieren – Sind all files korrekt? (→ Validator)

Ready for Maximum AI Visibility?

Start with the robots.txt check – then our funnel guides you step by step through all optimizations.

Check your robots.txt now →