Skip to content
GeoSalience
Measurement & Analytics · 5 min read

Who LLMs cite for GEO: 927 sources, 417 domains, zero of them us

Across 50 GEO questions, four LLMs returned 1,105 source links. After setting aside Gemini's opaque redirects, the 927 attributable citations spread across 417 domains — a long tail led by YouTube and SEO-tool blogs. Here's the map, with the grounding caveat up front.

··

Ask four large language models 50 questions about Generative Engine Optimization and they hand back 1,105 source links. Strip out the 178 that are opaque Gemini redirects (more on that below) and you are left with 927 attributable citations spread across 417 registrable domains — a long tail, not a short list. The single most-cited domain, YouTube, accounts for just 4.0% of them. And across all 417 domains, geosalience.com appears exactly zero times. This is the citation graph a new GEO publication is trying to break into, measured from our own harness on 2026-06-07.

The finding in one line: for GEO questions, LLMs cite a fragmented field dominated by SEO/GEO tool vendors and trade press — there is no single authority, which is both the bad news (no shortcut) and the good news (the field is open).

Read this first: the grounding was uneven

This is a real measurement, but it is not an even four-engine sample, and saying so is the whole point. The 50-prompt run grounded (actually searched the web) very unevenly:

EnginePrompts it searchedAttributable sourcesAttributable domains
Perplexity50 / 50388222
Claude35 / 50523281
Gemini13 / 500 (see below)0
ChatGPT3 / 501615

Two consequences follow, and both shape every number in this article:

  1. Claude and Perplexity carry the sample. Together they supply 911 of the 927 attributable citations — 98%. ChatGPT searched on only 3 of 50 prompts (it answers GEO questions mostly from training, not retrieval), so its 16 sources are a rounding error here. Gemini hit a free-tier quota (a 429 capped it at 13 prompts).
  2. Gemini's sources are opaque. Every one of Gemini's 178 returned links is a vertexaisearch.cloud.google.com/grounding-api-redirect/… wrapper — Google's grounding API hands back a redirect, not the publisher URL. So we cannot attribute a single Gemini citation to a real domain. We bucket those 178 separately and exclude them from the domain ranking rather than letting them collapse into "google.com" and falsely crown it the most-cited domain. That redirect design is itself a finding: 16% of all citations in this run are publisher-anonymous by construction.

So when you read "who LLMs cite" below, read it as "who Perplexity and Claude cite," with ChatGPT barely participating and Gemini unattributable. A stricter follow-up — forced search, a higher-quota Gemini key — is the obvious next run.

The answer-source graph is fragmented, not concentrated

If you expected a handful of authorities to own GEO answers, the data says otherwise. Concentration across the 927 attributable citations:

  • top 1 domain: 4.0%
  • top 5 domains: 11.9%
  • top 10 domains: 19.2%

Four hundred and seventeen domains share 927 citations — roughly two citations per domain. The top ten:

#DomainCitationsShareCategory
1youtube.com374.0%video
2searchengineland.com202.2%news/trade
3arxiv.org192.0%academic
4google.com171.8%vendor (real, not redirects)
5semrush.com171.8%vendor
6tryprofound.com161.7%GEO tool
7ahrefs.com141.5%vendor
8stackmatix.com131.4%SEO agency
9ziptie.dev131.4%SEO tool
10frase.io121.3%content/SEO tool

(The google.com here is the genuine article — developers.google.com, support.google.com, chromewebstore.google.com — not the Gemini redirects, which we removed.)

Who gets cited: the SEO/GEO tooling industry is writing the answers

Our category tagger is deliberately conservative: it labels only an explicit allowlist and drops everything else into "other," which is why 83.6% of citations land there. That large "other" is not noise — it is the signal we refused to dress up. Scan the top of it and the same kind of domain repeats: commercial SEO and GEO tooling.

Counting only domains anyone can verify as SEO/GEO products — Profound (tryprofound.com, 16), ziptie.dev (13), Frase (12), Otterly (10), SE Ranking (10), Rankability (7), Evertune (6) — alongside the ones the tagger already marks as vendors (Semrush 17, Ahrefs 14), the tooling industry supplies the plurality of what LLMs cite about GEO. The people selling GEO software are also the people the models quote about GEO. Trade press (searchengineland.com) and one academic anchor (arxiv.org, the original GEO paper's home) round out the visible head; neutral documentation and forums barely register (forums are 0.3%).

The machine-tagged split, for the record (over attributable citations):

CategoryShare
other (mostly long-tail vendor/agency blogs)83.6%
vendor (allowlisted)8.1%
academic2.7%
news2.5%
blog platform1.9%
reference0.9%
forum0.3%

The full domain-to-category map is in the dataset so you can re-tag it yourself.

The two retrieving engines barely agree

Claude and Perplexity both search aggressively, but they are not reading the same web. Their domain sets overlap by a Jaccard of 0.233 (95 shared domains out of their combined set) — meaningful, but it means roughly three-quarters of each engine's domains are its own. Their characters differ too:

  • Perplexity leans on YouTube. 36 of YouTube's 37 citations are Perplexity's; it reaches for video far more than the others, then SEO tools (Ahrefs, Semrush) and docs hosts.
  • Claude leans on text authority. Its top sources are arxiv.org (15), searchengineland.com (15), medium.com (11), and GEO tools (Profound 11, Frase 10) — papers and trade press over video.
  • ChatGPT and Gemini contribute almost nothing here — ChatGPT because it rarely searched, Gemini because its sources are opaque.

The other engine pairs overlap at essentially zero, but that is mostly an artifact of ChatGPT's tiny sample and Gemini's opacity, not a real divergence — another reason to re-run with fuller grounding before reading too much into cross-engine comparisons.

geosalience.com: 0

The honest punchline. Across 927 attributable citations and 417 domains, our own site is cited zero times — consistent with the 0% citation-rate baseline measured in the same run. The incumbents — tool vendors, trade press, a handful of papers — own GEO's answer-sources today. This study is the map of who we have to displace, and it is a first real finding from our public lab rather than a claim about someone else's data.

How we tested, and what this can't tell you

The method: the citation-rate harness ran a locked 50-prompt set (v1) across ChatGPT, Claude, Gemini, and Perplexity on 2026-06-07, capturing every source URL each engine returned. A separate analysis script (scripts/citation-sources-report.ts, open in our repo) reduces those captures to registrable domains, ranks them, and tags categories with a conservative allowlist. Every figure here comes from that script's output over the real captures — nothing is estimated.

Limitations, stated plainly:

  • Single run, 50 prompts, one day. This is a snapshot, not a trend. It becomes a series as the monthly harness runs accrue.
  • Uneven grounding (the table up top): the result is really "what Perplexity and Claude cite," with ChatGPT and Gemini under-represented.
  • Whole-domain, not page-level. We count registrable domains; we do not judge which specific page was most useful.
  • Prompt-set bias. A different 50 prompts would surface a different long tail; the set is GEO-centric by design.
  • Conservative categories. "Other" is large on purpose — see the dataset and re-tag if you disagree. For terms, start with what GEO is, the citation rate definition, and the llms.txt study for the companion machine-readability data. This sits in the measurement pillar alongside our other quantitative work.

Frequently asked questions

Why exclude Gemini's sources? Because they are not sources you can act on. Gemini returns vertexaisearch.cloud.google.com redirect wrappers instead of publisher URLs, so its 178 links can't be attributed to any domain. We count them as a separate "opaque" bucket (16% of all citations) and rank only the 927 attributable ones. Folding them into google.com would have invented a false #1.

Does "geosalience.com: 0" mean GEO doesn't work? No — it means we are new. The site went live on 2026-05-31; this run was 2026-06-07. Zero is the honest starting point, and this map shows exactly which domains currently hold the citations we want.

Can I check this? Yes. The dataset has every ranked domain, its count, and its category; the analysis script and the prompt set are in the repo. Re-run it against the raw captures and you should reproduce these numbers exactly.


Changelog

  • Published — 7 June 2026
  • Updated — 7 June 2026
  • Last reviewed — 7 June 2026
GeoSalience

Editorial

Independent publication on Generative Engine Optimization. Primary research on how AI search engines retrieve, rank, and cite.

Related