Platinum.ai
← Back to blog

Infrastructure · Antti Pasila · 6 min read

Your llms.txt Is Probably Garbage. Here's the Data.

We scanned 69 small business sites across 6 industries. 24 had llms.txt. 13 of those were Shopify stores producing near-identical auto-generated templates. Only 4 were properly structured. Everything you've heard about 'everyone has llms.txt now' is missing the point. Here's the data and how to fix it.

Hand marking items off a quality checklist, llms.txt file audit results

Key takeaways

  • 13 Shopify stores produce near-identical llms.txt files: all ~4,200 characters, identical 'Agent Instructions — [Brand]' template. AI sees these brands as interchangeable.
  • We checked 69 real small business websites. 35% had llms.txt. Of those 24 files, only 4 were properly structured. Half are cookie-cutter templates.
  • 8 sites returned what looked like llms.txt but were actually parked domain listings, WAF blocks, or HTML error pages. Scanners that don't validate content type will report these as real files.
  • Chrome now checks for llms.txt by default. A bad file is worse than no file: it gives the AI confident-sounding information that's wrong.

Two weeks ago, we scanned 69 small business websites across six industries to see how ready they were for AI-driven discovery. The headline number looked encouraging: 35% had an llms.txt file. Not bad for a file format that's barely a year old.

Then we went deeper. And ran the scan again after fixing our own scanner.

We pulled every llms.txt file from those sites and analyzed each one manually. We checked for the four core sections every good file should have: an overview of the business, a list of services or products, key facts and contact info, and a set of key page links. We checked for structured tables. For canonical excerpts that AI models can quote verbatim. For proper formatting with headers and metadata.

What we found changed how we think about this whole space.

Most llms.txt files aren't real llms.txt files

The first thing we had to fix was our own scanner. Of the 35 files we initially flagged as llms.txt, 8 were complete noise:

  • 3 restaurant sites returned WAF block pages. 'Unauthorized Activity Detected.' HTML served as text/html.
  • 5 lawyer and dental domains returned parked GoDaddy 'for sale' pages. They looked like llms.txt at first glance: markdown-formatted content starting with '# www.[domain].com'. Read the fine print and it's all about how to buy the domain on GoDaddy aftermarket.

Our original scanner accepted anything with HTTP 200 and more than 10 characters. Bad assumption. We fixed it: Platinum now rejects HTML content, WAF blocks, and parked domain pages. If a scanner doesn't validate content type, its numbers are inflated.

After stripping out the noise: 24 sites had actual llms.txt. The quality picture was worse than the simple count suggested.

What we found: 4 decent files out of 24

Of the 24 real llms.txt files:

  • 13 were Shopify stores producing near-identical auto-generated templates. All ~4,200 characters with the same structure.
  • 4 were properly structured with meaningful business content.
  • 3 had substantial text (17,000+ characters) but almost no section structure: long walls of text with barely any headers.
  • 1 was barely 900 characters: technically a file, practically useless.
  • 3 were no longer accessible: blocked, removed, or empty.

Let that sink in. Out of 24 sites that did the work to publish llms.txt, only 4 got it right.

The cookie-cutter trap

Thirteen direct-to-consumer brands in our scan had llms.txt files. Brooklinen, Frank And Oak, Parachute Home, Tentree, Pistol Lake, Unbound Merino, Grove Collaborative, Public Goods, Saalt, Milk Bar, Bolland Branch, Frankies 457, and Plowz & Mowz. Every single file uses the same template at around 4,200 characters. Same structure. Same missing sections. Same result.

The files are all identical: the same "Agent Instructions — [Brand]" template at around 4,300 characters, with the brand name swapped and a thin list of product categories dropped in. Same structure. Same missing sections. Same result.

This is not an accident. These brands are all on Shopify. Their ecommerce platform auto-generates an llms.txt file that checks the box but adds zero value. An AI reading these files gets a list of product categories and nothing else: no overview of what the brand stands for, no key facts, no contact details, no structured way to compare them to competitors.

The template trap is worse than having no file at all. Thirteen brands that each spent real money building distinct identities online are now indistinguishable to any AI agent that reads their llms.txt. Same file. Same template. Same story.

What good looks like

One site got it right. Shack Shine, a home services company, has all four core sections present with clean structure. Their llms.txt is 5,347 characters of well-organized business information. A proper overview, detailed service descriptions, locations and contact info, and clear page links. An AI reading it walks away with a real picture of the business: what they do, where they operate, how to contact them.

That's what every llms.txt should look like.

Why this stops being optional in 2026

Chrome now checks for llms.txt by default. This isn't a future scenario. It's already happening. When the world's dominant browser starts probing for machine-readable site profiles, the quality of those profiles stops being a nice-to-have and starts being a ranking signal.

AI agents are the fastest-growing user group on the web. They don't browse pages. They consume structured data. A bad llms.txt is worse than a missing one: it gives the AI confident-sounding information that's incomplete or wrong. The model will cite your bad file, confidently, to its user. You don't get to correct it.

The sites we scanned that had llms.txt already did the hard part: they published the file. What's left is making it useful.

The four things every llms.txt needs

If you have an llms.txt, open it now. Check for these:

  1. An overview section that says what your business actually does, in plain language, in 2-3 sentences. No marketing fluff. Just facts an AI can quote.
  2. A services or products section. Not just names. Enough detail that a model can understand whether you're the right result for a user query.
  3. Key facts and contact info. Hours, locations, phone numbers, email addresses. The things people ask AI assistants for.
  4. Key page links that point to the most important pages on your site. AI agents use these to decide which pages to crawl deeper.

Beyond those four, structured tables make a huge difference. They let models extract and compare information reliably. Canonical excerpts (short, quotable statements about your business formatted for verbatim citation) are what separate a file that gets referenced from one that gets ignored.

Most of the files we analyzed had none of this. They were quick exports from a template, or a plain text dump of an About page, or, in too many cases, a few dozen characters of nothing.

What's next

The llms.txt standard is barely a year old. It's going to evolve fast now that browsers are checking for it. The sites that publish quality files now, before it becomes a competitive table stake, are the ones AI models will cite for years.

We built Platinum because we saw this coming. Our scanner checks whether your site has llms.txt, then analyzes the quality of what's there. If your file is thin, templated, or just plain wrong, we show you exactly what's missing and help you build a proper one.

Check your site at platinum.ai. It's free, and it takes about five seconds. If your llms.txt is one of the thousands that look like everyone else's, you'll know exactly what to fix.