JSON-LD Recipes for Articles, Datasets, and FAQs
Copy-paste JSON-LD blocks that pass schema.org validators, render correctly in Google Rich Results, and surface in AI search engines. Annotated with what each field actually does.
You're not going to remember the entire schema.org spec, and you don't need to. These are the seven blocks that cover most practical use on a content site, with notes on what each field is doing. Every snippet here is JSON-LD we run in production on GeoSalience as of 2026-06-07 — see what GEO is for why structured data matters to AI search in the first place.
Article
For every editorial post:
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Your article title",
"description": "The deck — your dek, your subtitle, 1–2 sentences.",
"image": "https://example.com/og-image.png",
"datePublished": "2026-05-19T00:00:00Z",
"dateModified": "2026-05-19T00:00:00Z",
"author": [
{
"@type": "Person",
"name": "Author Name",
"url": "https://example.com/authors/author-slug"
}
],
"publisher": {
"@type": "Organization",
"name": "Your Publication",
"url": "https://example.com",
"logo": { "@type": "ImageObject", "url": "https://example.com/logo.png" }
},
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://example.com/article-slug"
}
}Field notes:
headline— keep it under 110 characters or Google Rich Results truncates.image— must be ≥1200px wide for Top Stories eligibility.dateModified— bump this every time you make a substantive edit. Some LLMs use it as a freshness signal.author— array, even with one author. Future-proofs co-author additions.
FAQPage
For a page that's a list of question-answer pairs:
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is X?",
"acceptedAnswer": { "@type": "Answer", "text": "X is …" }
},
{
"@type": "Question",
"name": "How do I Y?",
"acceptedAnswer": { "@type": "Answer", "text": "To Y, …" }
}
]
}Field notes:
- Only apply FAQPage to pages that are actually FAQ-shaped. Google has periodically penalised over-use.
acceptedAnswer.textcan include limited HTML. Keep it short and complete — if you need 500 words to answer, FAQPage is the wrong format.
Dataset
For a structured-data download (this is our flagship recipe for GEO):
{
"@context": "https://schema.org",
"@type": "Dataset",
"name": "Your Dataset Name — Raw Results",
"description": "One-sentence description of the columns: e.g. prompt, brand, LLM, citation, ground truth.",
"url": "https://example.com/datasets/your-dataset.csv",
"encodingFormat": "text/csv",
"creator": { "@type": "Organization", "name": "Your Publication", "url": "https://example.com" },
"license": "https://creativecommons.org/licenses/by/4.0/",
"datePublished": "2026-06-07",
"isAccessibleForFree": true,
"keywords": ["geo", "benchmark", "ai-search", "citations"]
}Field notes:
Datasettyping makes a structured download explicitly machine-readable and licensable — it tells an engine "this is data, here is how it's formatted, here is who may use it." We publish our own measurements this way; the citation-rate and crawler datasets behind our lab carry exactly this block, and we're measuring whether it moves our own citation rate over time rather than asserting a number we haven't earned yet.licensecarries the reuse terms with the data. A permissive license like CC-BY states the attribution expectation explicitly, which is the rationale for using it on data you want cited back.isAccessibleForFree: truetells the engine the dataset can be linked and fetched without a paywall.
Person
For author bio pages:
{
"@context": "https://schema.org",
"@type": "Person",
"name": "Author Name",
"jobTitle": "Editor",
"description": "What this person does.",
"url": "https://example.com/authors/slug",
"sameAs": [
"https://twitter.com/handle",
"https://linkedin.com/in/handle",
"https://github.com/handle"
]
}Field notes:
sameAsis the field LLMs use to disambiguate "Jane Doe at Acme" from any other Jane Doe. Always link your verified social profiles.jobTitleis useful but not required.
Organization (for the publisher)
Goes in your site's root layout — one block per page:
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Your Publication",
"url": "https://example.com",
"logo": { "@type": "ImageObject", "url": "https://example.com/logo.png" },
"sameAs": ["https://twitter.com/handle", "https://linkedin.com/company/handle"]
}Field notes:
- This is a publisher-identity signal. LLMs use it to assess source trustworthiness.
logoshould meet Google's structured-data logo guidance: a rectangular image, ideally 600×60, and at least 112×112px.
BreadcrumbList
For navigation context — Google uses this to render breadcrumbs in SERP:
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{ "@type": "ListItem", "position": 1, "name": "Home", "item": "https://example.com" },
{ "@type": "ListItem", "position": 2, "name": "Foundations", "item": "https://example.com/foundations" },
{ "@type": "ListItem", "position": 3, "name": "What is GEO?", "item": "https://example.com/foundations/what-is-geo" }
]
}Field notes:
positionstarts at 1.- The last item's
itemURL must match the canonical of the current page.
Common mistakes
- Schema in the page body — JSON-LD lives in
<script type="application/ld+json">in the<head>(or directly inside the article markup). Don't put it in<noscript>or hidden divs. - Mismatching dates —
datePublishedanddateModifiedshould match the visible date on the page. - Stale
dateModified— bumping this without actually editing the article is a credibility risk. - Missing
mainEntityOfPage— small thing but it's how the search engine confirms which URL this article block is about. - Schema for SEO that contradicts the page — if
headlinediffers from the visible<h1>, Google flags this and may demote the page.
Validation
Always validate before deploying:
- Schema.org Validator — catches syntax errors and required-field misses.
- Google Rich Results Test — confirms eligibility for specific Google features.
How we wrote this
These recipes are taken directly from production code on GeoSalience, as of 2026-06-07. Inspect lib/seo.ts in our repo for the live implementations. The Article, Organization, BreadcrumbList, and Dataset blocks above all ship on real pages — the lab case study, for instance, carries the Dataset block for its citation-rate data.
See also
- Technical pillar — every article we publish on the technical side of GEO.
- The llms.txt spec: adoption and setup — the other machine-readability layer, alongside JSON-LD.
- What is GEO — the concept these recipes serve.
- This site is our GEO lab — where we run this exact
Datasetmarkup on our own data. - GEO — the one-line definition, if you landed here cold.
Changelog
- Published — 7 June 2026
- Updated — 7 June 2026
Editorial
Independent publication on Generative Engine Optimization. Primary research on how AI search engines retrieve, rank, and cite.
Related
llms.txt: Spec, 100-Domain Adoption Audit, and Setup
We audited 100 top developer-tools and SaaS sites for an llms.txt file. Only 37 of them serve one at the apex — and the gap is concentrated in the places you might expect it not to be. The full spec, the audit, and a 10-minute setup.
Who LLMs cite for GEO: 927 sources, 417 domains, zero of them us
Across 50 GEO questions, four LLMs returned 1,105 source links. After setting aside Gemini's opaque redirects, the 927 attributable citations spread across 417 domains — a long tail led by YouTube and SEO-tool blogs. Here's the map, with the grounding caveat up front.
This site is our GEO lab: the stack, the data, the experiments
GeoSalience measures its own AI-crawler traffic, tracks whether LLMs cite it, and runs controlled experiments on its own pages. Here are the first real numbers: an 898-request crawler footprint, a measured 0% citation baseline, and an honest account of where data is still thin.