Measurement & Analytics · 5 min read

Who LLMs cite for GEO: 927 sources, 417 domains, zero of them us

Across 50 GEO questions, four LLMs returned 1,105 source links. After setting aside Gemini's opaque redirects, the 927 attributable citations spread across 417 domains — a long tail led by YouTube and SEO-tool blogs. Here's the map, with the grounding caveat up front.

GeoSalience

·Published 7 June 2026·Updated 7 Jun 2026·View as Markdown (.md)

Ask four large language models 50 questions about Generative Engine Optimization and they hand back 1,105 source links. Strip out the 178 that are opaque Gemini redirects (more on that below) and you are left with 927 attributable citations spread across 417 registrable domains — a long tail, not a short list. The single most-cited domain, YouTube, accounts for just 4.0% of them. And across all 417 domains, geosalience.com appears exactly zero times. This is the citation graph a new GEO publication is trying to break into, measured from our own harness on 2026-06-07.

The finding in one line: for GEO questions, LLMs cite a fragmented field dominated by SEO/GEO tool vendors and trade press — there is no single authority, which is both the bad news (no shortcut) and the good news (the field is open).

Read this first: the grounding was uneven

This is a real measurement, but it is not an even four-engine sample, and saying so is the whole point. The 50-prompt run grounded (actually searched the web) very unevenly:

Engine	Prompts it searched	Attributable sources	Attributable domains
Perplexity	50 / 50	388	222
Claude	35 / 50	523	281
Gemini	13 / 50	0 (see below)	0
ChatGPT	3 / 50	16	15

Two consequences follow, and both shape every number in this article:

Claude and Perplexity carry the sample. Together they supply 911 of the 927 attributable citations — 98%. ChatGPT searched on only 3 of 50 prompts (it answers GEO questions mostly from training, not retrieval), so its 16 sources are a rounding error here. Gemini hit a free-tier quota (a 429 capped it at 13 prompts).
Gemini's sources are opaque. Every one of Gemini's 178 returned links is a vertexaisearch.cloud.google.com/grounding-api-redirect/… wrapper — Google's grounding API hands back a redirect, not the publisher URL. So we cannot attribute a single Gemini citation to a real domain. We bucket those 178 separately and exclude them from the domain ranking rather than letting them collapse into "google.com" and falsely crown it the most-cited domain. That redirect design is itself a finding: 16% of all citations in this run are publisher-anonymous by construction.

So when you read "who LLMs cite" below, read it as "who Perplexity and Claude cite," with ChatGPT barely participating and Gemini unattributable. A stricter follow-up — forced search, a higher-quota Gemini key — is the obvious next run.

The answer-source graph is fragmented, not concentrated

If you expected a handful of authorities to own GEO answers, the data says otherwise. Concentration across the 927 attributable citations:

top 1 domain: 4.0%
top 5 domains: 11.9%
top 10 domains: 19.2%

Four hundred and seventeen domains share 927 citations — roughly two citations per domain. The top ten:

#	Domain	Citations	Share	Category
1	youtube.com	37	4.0%	video
2	searchengineland.com	20	2.2%	news/trade
3	arxiv.org	19	2.0%	academic
4	google.com	17	1.8%	vendor (real, not redirects)
5	semrush.com	17	1.8%	vendor
6	tryprofound.com	16	1.7%	GEO tool
7	ahrefs.com	14	1.5%	vendor
8	stackmatix.com	13	1.4%	SEO agency
9	ziptie.dev	13	1.4%	SEO tool
10	frase.io	12	1.3%	content/SEO tool

(The google.com here is the genuine article — developers.google.com, support.google.com, chromewebstore.google.com — not the Gemini redirects, which we removed.)

Who gets cited: the SEO/GEO tooling industry is writing the answers

Our category tagger is deliberately conservative: it labels only an explicit allowlist and drops everything else into "other," which is why 83.6% of citations land there. That large "other" is not noise — it is the signal we refused to dress up. Scan the top of it and the same kind of domain repeats: commercial SEO and GEO tooling.

Counting only domains anyone can verify as SEO/GEO products — Profound (tryprofound.com, 16), ziptie.dev (13), Frase (12), Otterly (10), SE Ranking (10), Rankability (7), Evertune (6) — alongside the ones the tagger already marks as vendors (Semrush 17, Ahrefs 14), the tooling industry supplies the plurality of what LLMs cite about GEO. The people selling GEO software are also the people the models quote about GEO. Trade press (searchengineland.com) and one academic anchor (arxiv.org, the original GEO paper's home) round out the visible head; neutral documentation and forums barely register (forums are 0.3%).

The machine-tagged split, for the record (over attributable citations):

Category	Share
other (mostly long-tail vendor/agency blogs)	83.6%
vendor (allowlisted)	8.1%
academic	2.7%
news	2.5%
blog platform	1.9%
reference	0.9%
forum	0.3%

The full domain-to-category map is in the dataset so you can re-tag it yourself.

The two retrieving engines barely agree

Claude and Perplexity both search aggressively, but they are not reading the same web. Their domain sets overlap by a Jaccard of 0.233 (95 shared domains out of their combined set) — meaningful, but it means roughly three-quarters of each engine's domains are its own. Their characters differ too:

Perplexity leans on YouTube. 36 of YouTube's 37 citations are Perplexity's; it reaches for video far more than the others, then SEO tools (Ahrefs, Semrush) and docs hosts.
Claude leans on text authority. Its top sources are arxiv.org (15), searchengineland.com (15), medium.com (11), and GEO tools (Profound 11, Frase 10) — papers and trade press over video.
ChatGPT and Gemini contribute almost nothing here — ChatGPT because it rarely searched, Gemini because its sources are opaque.

The other engine pairs overlap at essentially zero, but that is mostly an artifact of ChatGPT's tiny sample and Gemini's opacity, not a real divergence — another reason to re-run with fuller grounding before reading too much into cross-engine comparisons.

geosalience.com: 0

The honest punchline. Across 927 attributable citations and 417 domains, our own site is cited zero times — consistent with the 0% citation-rate baseline measured in the same run. The incumbents — tool vendors, trade press, a handful of papers — own GEO's answer-sources today. This study is the map of who we have to displace, and it is a first real finding from our public lab rather than a claim about someone else's data.

How we tested, and what this can't tell you

The method: the citation-rate harness ran a locked 50-prompt set (v1) across ChatGPT, Claude, Gemini, and Perplexity on 2026-06-07, capturing every source URL each engine returned. A separate analysis script (scripts/citation-sources-report.ts, open in our repo) reduces those captures to registrable domains, ranks them, and tags categories with a conservative allowlist. Every figure here comes from that script's output over the real captures — nothing is estimated.

Limitations, stated plainly:

Single run, 50 prompts, one day. This is a snapshot, not a trend. It becomes a series as the monthly harness runs accrue.
Uneven grounding (the table up top): the result is really "what Perplexity and Claude cite," with ChatGPT and Gemini under-represented.
Whole-domain, not page-level. We count registrable domains; we do not judge which specific page was most useful.
Prompt-set bias. A different 50 prompts would surface a different long tail; the set is GEO-centric by design.
Conservative categories. "Other" is large on purpose — see the dataset and re-tag if you disagree. For terms, start with what GEO is, the citation rate definition, and the llms.txt study for the companion machine-readability data. This sits in the measurement pillar alongside our other quantitative work — including the State of GEO Q2 2026, which turns the same harness on 100 brands across the four engines.

Frequently asked questions

Why exclude Gemini's sources? Because they are not sources you can act on. Gemini returns vertexaisearch.cloud.google.com redirect wrappers instead of publisher URLs, so its 178 links can't be attributed to any domain. We count them as a separate "opaque" bucket (16% of all citations) and rank only the 927 attributable ones. Folding them into google.com would have invented a false #1.

Does "geosalience.com: 0" mean GEO doesn't work? No — it means we are new. The site went live on 2026-05-31; this run was 2026-06-07. Zero is the honest starting point, and this map shows exactly which domains currently hold the citations we want.

Can I check this? Yes. The dataset has every ranked domain, its count, and its category; the analysis script and the prompt set are in the repo. Re-run it against the raw captures and you should reproduce these numbers exactly.

Changelog

Published — 7 June 2026
Updated — 7 June 2026
Last reviewed — 7 June 2026

GeoSalience

Editorial

Independent publication on Generative Engine Optimization. Primary research on how AI search engines retrieve, rank, and cite.

Twitter LinkedIn GitHub Bluesky

Who LLMs cite for GEO: 927 sources, 417 domains, zero of them us

Read this first: the grounding was uneven

The answer-source graph is fragmented, not concentrated

Who gets cited: the SEO/GEO tooling industry is writing the answers

The two retrieving engines barely agree

geosalience.com: 0

How we tested, and what this can't tell you

Frequently asked questions

Changelog

Related

State of GEO Q2 2026: the AI engine you optimize for matters most

This site is our GEO lab: the stack, the data, the experiments

JSON-LD Recipes for Articles, Datasets, and FAQs