Platinum.ai
← Back to blog

Infrastructure · Platinum.ai · 8 min read

What Happens When an AI Agent Tries to Read Your Website (And Why It Gives Up)

Walk through the actual process an AI agent follows when it researches your business: HTML fetch, content extraction, token conversion, cost evaluation - and the decision to skip you or dig deeper.

Developer workspace with code on screen - how an AI agent parses a website

Key takeaways

  • Agents extract roughly 18% of your raw HTML as usable text - the rest is visual code, navigation, and scripts.
  • A typical multi-page research session costs 5,000 to 15,000 tokens. An AI Website Profile: under 1,500.
  • Agents are economically rational. When comparing vendors, they allocate deeper analysis to the cheapest-to-read businesses.

When someone asks an AI assistant to recommend a plumber, compare accounting firms, or find a restaurant for a business dinner, the agent does not open a browser and scroll through your homepage the way a human would. It runs a systematic research process with real computational constraints. Understanding that process is the key to understanding why some businesses get recommended and others get silently skipped.

Round 1: the homepage fetch

The agent starts by fetching your homepage as raw HTML. It does not render your beautiful design, your animated hero section, or your carefully chosen images. It receives the source code: thousands of lines of HTML, CSS classes, JavaScript bundles, navigation markup, cookie consent banners, third-party tracking scripts, and - somewhere in the middle - your actual business content.

The agent then strips the visual code. It extracts text content from the HTML, discarding everything a human's eyes would parse visually but a language model cannot use. In our testing, this extraction typically yields about 18 percent of the raw page weight as usable text. An 80KB homepage produces roughly 14,400 characters of extracted content - and much of that is navigation labels, footer links, and boilerplate, not business facts.

What happens when an agent reads your website

Fetch

Raw HTML

Strip

Remove CSS/JS

Extract

~18% text

Tokenize

LLM tokens

Reason

Generate answer

82% of your page weight is discarded. The agent pays for all of it.

Round 1 continued: following key pages

If the homepage provides enough signal, the agent follows internal links to three or four key pages: your services page, pricing page, about page, and maybe a top-level product page. Each page goes through the same fetch-strip-extract cycle. The agent is building a mental model of your business from the extracted text fragments.

This is where things get expensive. Each page adds tokens to the context window. Subpages are typically 40 percent the weight of your homepage, but they add up fast. After four or five pages, the agent has consumed 3,000 to 8,000 tokens just on your business - and it has not started comparing you to anyone else yet.

Round 2: the deep dive (maybe)

If the agent's initial scan produces a coherent picture, it may do a second round: checking your FAQ for policy details, looking for pricing breakdowns, reviewing case studies. This can push the total to ten pages and 10,000 to 15,000 tokens.

But here is the critical insight: the agent is not obligated to do round two. If your homepage is confusing, your pricing is hidden in a PDF, or your services page is a wall of marketing language without concrete facts, the agent has a choice. It can spend more tokens trying to figure you out, or it can move on to the next vendor whose site is cleaner. Agents are designed to optimize. They almost always move on.

The cost calculation

At current API prices, mid-tier models cost roughly $3 per million input tokens. An agent researching your business at 10,000 tokens costs about 3 cents per visit. That seems trivial until you consider scale: an agent comparing ten vendors spends 30 cents on parsing alone. Multiply by thousands of queries per day across millions of users, and the economic pressure to be efficient is enormous.

This is why we built the agent research cost calculation into our Site Scan. When you scan your domain, we estimate exactly how many tokens an agent burns researching you: pages visited, raw HTML weight, extraction ratio, token count, and dollar cost. We then show what that cost looks like with an AI Website Profile - typically a 90 to 99 percent reduction.

Estimated agent research cost per business

Without AI Profile

Pages crawled~10
Tokens consumed~10,000
Cost per visit~3.0¢
Hallucination riskHigh
~30¢for 10 comparisons

With AI Website Profile

Files read1
Tokens consumed~1,500
Cost per visit~0.05¢
Hallucination riskNear zero
~0.5¢for 10 comparisons

Based on mid-tier LLM pricing (~$3/M input tokens). Actual costs vary by model and provider.

What the agent actually wants

After all that parsing, the agent is looking for a handful of facts it can use in its response: what does this business do, who do they serve, what does it cost, where are they located, what proof exists that they are legitimate, and what policies apply. That is it. Everything else - your brand story, your team photos, your animated transitions - is noise the agent paid tokens to skip.

This is the fundamental mismatch. Your website was built to persuade humans through design. The agent needs data it can reason about. These are not the same thing, and no amount of SEO optimization bridges the gap. You need a separate layer built specifically for machine consumption.

The AI Website Profile alternative

An AI Website Profile at /llms.txt gives the agent everything it needs in one lightweight file: 2,000 to 6,000 characters of clean, structured text. No HTML to strip. No JavaScript to skip. No navigation labels to filter. No PDFs to fail on. The agent reads the file, gets a complete and verified picture of your business, and moves directly to reasoning and comparison.

The token cost drops from 10,000+ to under 1,500. The extraction accuracy goes from best-effort scraping to 100 percent - because you authored the content specifically for agents. The hallucination risk drops because the agent has verified facts instead of inferred fragments.

See it for yourself

Run a Site Scan on your domain from our homepage. We will show you exactly what an agent encounters: your page weight, extraction estimate, token count, and research cost. Then compare that to what the same agent would see with an AI Website Profile. The difference is not incremental. It is structural.