robots.txt for the AI Age: GPTBot, Crawlers, llms.txt

What you'll learn

Keep admin, API, and private paths disallowed for all user agents you list.
Reference your sitemap and, when applicable, your llms.txt location.
Treat robots.txt as intent, not a security boundary: some bots ignore it.

robots.txt tells well-behaved crawlers which paths they should skip. Search engines generally respect it. Some AI crawlers may still fetch URLs for training or retrieval, so combine robots rules with authentication for anything truly private.

What to block everywhere

Admin and account areas
Internal APIs and webhooks
Draft or staging paths
User-specific or PII-heavy routes

Template you can adapt

Start from one block per major crawler family, then a User-agent: * fallback. Keep rules consistent so you do not accidentally allow GPTBot where you blocked *.

# Example skeleton: replace paths with yours
User-agent: Googlebot
Allow: /
Disallow: /admin/
Disallow: /api/

User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /account/

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://yourdomain.com/sitemap.xml

If your stack documents an LLMS or similar line for discovery files, add it only when valid. Example: LLMS: https://yourdomain.com/llms.txt

AI crawlers and compliance

Publishing Allow for AI bots means you are inviting fetches of public pages. If you need to opt out of certain training uses, use vendor-specific mechanisms where available in addition to robots.txt. Your legal team should align public marketing with your data policy.

Testing

Fetch https://yourdomain.com/robots.txt in the browser. Fix syntax errors (typos in Disallow, wrong user-agent names). Use crawl tools or curl with a custom User-Agent header to spot-check critical URLs.

Operational tips

Update robots.txt when you add new app sections.
Keep sitemap URL current if your sitemap path changes.
Document internally who owns edits so deploys do not wipe custom files.

Best Practice robots.txt for the AI Age