Does inclusion in llms-full.txt change crawl or citation?
llms-full.txt bundles our cornerstones as one fetchable text file. Does being in it actually get a page read and cited more than being left out? Keep one article in, hold a comparable one out, and measure. Proposed — and the 'hold out' arm has a real cost we name up front.
- Hypothesis
- An article included in llms-full.txt is fetched and cited more by AI systems than a comparable article deliberately excluded from it.
- What changed
- Treatment article remains in llms-full.txt (status quo). Control article is temporarily removed from the llms-full.txt build for the window. Both pages' HTML, schema, and .md aliases are otherwise unchanged.
- Treatment
- Metric
- AI-crawler hits
- Window
- proposed 21 June 2026 → 21 July 2026
Status: proposed. llms-full.txt is unchanged and both articles are currently in it. The dates are proposed. The editor decides whether to run this and confirms the pairing.
The question
We publish llms-full.txt — a single text bundle of our cornerstone articles — on the assumption that LLMs find a concatenated file easier to ingest than crawling pages one by one. Does it actually change anything?
Proposed design
- Treatment: keep What is GEO? in llms-full.txt.
- Control: temporarily remove the llms.txt spec article from the llms-full.txt build for the window, then restore it.
- Primary metric: per-page AI-crawler hits, plus hits on llms-full.txt itself, before vs after.
- Window: 14 days before, 30-day review.
The cost we won't hide
This is the one candidate where the control arm has a price: the held-out cornerstone gets less machine exposure for a month. That is a deliberate, disclosed trade for a clean answer to "does the bundle matter." If the editor judges the exposure too valuable to risk, the honest alternative is to run this only once a third comparable cornerstone is live, so no flagship page bears the cost.
What would count as a result
A drop in crawl frequency for the held-out control (and/or a relative rise for the included treatment) that recovers after the control is restored. A clean recover-on-restore pattern would be the strongest version of the signal.
Analysis readout
This experiment has not started. Once it runs and the crawler/citation data accrue, the before/after readout appears here and updates as results land.
Limitations
- Holding a cornerstone out of llms-full.txt has a real cost: less LLM exposure for that page during the window. The control 'pays' for the experiment. The editor should weigh this before approving.
- llms-full.txt is new and may be fetched rarely; if almost nothing reads it during the window, the experiment is inconclusive by low base rate, not by null effect.
- Citation is site-wide, not page-attributable; crawl frequency of each page (and of llms-full.txt itself) is the primary signal.
- Single domain, single matched pair. Requires the crawler pipeline streaming first.
Changelog
- Published — 31 May 2026
Raw markdown: /lab/experiments/exp-004.md