Skip to content
GeoSalience
exp-004PlannedAI-crawler hits

Does inclusion in llms-full.txt change crawl or citation?

llms-full.txt bundles our cornerstones as one fetchable text file. Does being in it actually get a page read and cited more than being left out? Keep one article in, hold a comparable one out, and measure. Proposed — and the 'hold out' arm has a real cost we name up front.

Hypothesis
An article included in llms-full.txt is fetched and cited more by AI systems than a comparable article deliberately excluded from it.
What changed
Treatment article remains in llms-full.txt (status quo). Control article is temporarily removed from the llms-full.txt build for the window. Both pages' HTML, schema, and .md aliases are otherwise unchanged.
Metric
AI-crawler hits
Window
proposed 21 June 202621 July 2026

Status: proposed. llms-full.txt is unchanged and both articles are currently in it. The dates are proposed. The editor decides whether to run this and confirms the pairing.

The question

We publish llms-full.txt — a single text bundle of our cornerstone articles — on the assumption that LLMs find a concatenated file easier to ingest than crawling pages one by one. Does it actually change anything?

Proposed design

  • Treatment: keep What is GEO? in llms-full.txt.
  • Control: temporarily remove the llms.txt spec article from the llms-full.txt build for the window, then restore it.
  • Primary metric: per-page AI-crawler hits, plus hits on llms-full.txt itself, before vs after.
  • Window: 14 days before, 30-day review.

The cost we won't hide

This is the one candidate where the control arm has a price: the held-out cornerstone gets less machine exposure for a month. That is a deliberate, disclosed trade for a clean answer to "does the bundle matter." If the editor judges the exposure too valuable to risk, the honest alternative is to run this only once a third comparable cornerstone is live, so no flagship page bears the cost.

What would count as a result

A drop in crawl frequency for the held-out control (and/or a relative rise for the included treatment) that recovers after the control is restored. A clean recover-on-restore pattern would be the strongest version of the signal.

Analysis readout

This experiment has not started. Once it runs and the crawler/citation data accrue, the before/after readout appears here and updates as results land.

Limitations

  • Holding a cornerstone out of llms-full.txt has a real cost: less LLM exposure for that page during the window. The control 'pays' for the experiment. The editor should weigh this before approving.
  • llms-full.txt is new and may be fetched rarely; if almost nothing reads it during the window, the experiment is inconclusive by low base rate, not by null effect.
  • Citation is site-wide, not page-attributable; crawl frequency of each page (and of llms-full.txt itself) is the primary signal.
  • Single domain, single matched pair. Requires the crawler pipeline streaming first.

Changelog

  • Published — 31 May 2026

Raw markdown: /lab/experiments/exp-004.md