Skip to content
The Cited

Methodology

How we test, measure, and report on Generative Engine Optimization. Every claim should be reproducible from this page.

Why we publish methodology

GEO is a young discipline. Most of what circulates online is repackaged guesswork — a screenshot of a ChatGPT answer with a caption. We want to be the publication that other publications cite, which means our results must hold up when someone tries to repeat them.

Sources of evidence

Every cornerstone article on The Cited is built from one or more of:

  • LLM probing — running a set of standardised prompts against ChatGPT, Claude, Perplexity, AI Overviews, and Gemini, then recording the surfaces (citations, quoted passages, brand mentions) each engine returns.
  • Site audits — fetching `llms.txt`, `robots.txt`, Schema.org JSON-LD, and HTML structure from a defined target list, then scoring against a published rubric.
  • Public datasets — Common Crawl, the Internet Archive, and public model evaluation harnesses where these are relevant.
  • Direct experimentation — building llms.txt files / Schema variants / answer templates and measuring the effect on subsequent crawls and citations.

How prompts are run

Unless noted otherwise, every LLM probe follows the same protocol:

  1. Fresh session. Each prompt is run in a fresh conversation. No system prompts, memory, or browsing context that we don't explicitly disclose.
  2. Three repetitions. Each prompt is run three times; we report the modal answer and flag any high-variance prompts.
  3. Dated and timezoned. Every result is logged with an ISO-8601 timestamp and the model build string the engine returns.
  4. Geography disclosed. We report from the EU unless the article states otherwise (geography materially affects AI Overviews, Perplexity, and Gemini).

Sample sizes

For brand benchmarks, the default sample is 50 prompts × 5 brands. For technical audits (llms.txt, Schema), the default sample is 50–100 domains. We'll publish a power note when a small sample is used.

Datasets

Where a cornerstone article relies on a structured dataset, we publish the dataset alongside as JSONL or CSV. Datasets are hosted from this site and survive any change in the underlying article. URLs follow the pattern /datasets/<slug>.<ext>.

Errata

When we get something wrong, we say so. Corrections live in the changelog of the affected article (visible at the bottom of every cornerstone piece) and, for material corrections, in the weekly newsletter the following Sunday.

What we don't do

  • No LLM-generated articles. The Cited's articles are written by humans. LLMs are used as research tools (probing, drafting outlines, summarising datasets) and that use is disclosed in the article's "How we wrote this" block.
  • No paid placements without disclosure. See Editorial Policy for the full disclosure framework.
  • No survivorship-only samples. If we publish a list of brands that get cited, we also publish the brands that didn't.

Reach out

Have a methodological objection or a prompt set you'd like us to add to a future benchmark? Email hello@geosalience.com.