Platinum.ai
← Guides

Antti Pasila · 7 min read

Best Practice robots.txt for the AI Age

Balance search crawlers and AI bots: block sensitive paths, point to your sitemap and llms.txt, and test what you publish.

What you'll learn

  • Keep admin, API, and private paths disallowed for all user agents you list.
  • Reference your sitemap and, when applicable, your llms.txt location.
  • Treat robots.txt as intent, not a security boundary: some bots ignore it.

robots.txt tells well-behaved crawlers which paths they should skip. Search engines generally respect it. Some AI crawlers may still fetch URLs for training or retrieval, so combine robots rules with authentication for anything truly private.

What to block everywhere

  • Admin and account areas
  • Internal APIs and webhooks
  • Draft or staging paths
  • User-specific or PII-heavy routes

Template you can adapt

Start from one block per major crawler family, then a User-agent: * fallback. Keep rules consistent so you do not accidentally allow GPTBot where you blocked *.

# Example skeleton: replace paths with yours
User-agent: Googlebot
Allow: /
Disallow: /admin/
Disallow: /api/

User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /account/

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://yourdomain.com/sitemap.xml

If your stack documents an LLMS or similar line for discovery files, add it only when valid. Example: LLMS: https://yourdomain.com/llms.txt

AI crawlers and compliance

Publishing Allow for AI bots means you are inviting fetches of public pages. If you need to opt out of certain training uses, use vendor-specific mechanisms where available in addition to robots.txt. Your legal team should align public marketing with your data policy.

Testing

Fetch https://yourdomain.com/robots.txt in the browser. Fix syntax errors (typos in Disallow, wrong user-agent names). Use crawl tools or curl with a custom User-Agent header to spot-check critical URLs.

Operational tips

  • Update robots.txt when you add new app sections.
  • Keep sitemap URL current if your sitemap path changes.
  • Document internally who owns edits so deploys do not wipe custom files.