Table of Contents
Fetching ...

Diagnosing and Repairing Citation Failures in Generative Engine Optimization

Zhihua Tian, Yuhan Chen, Yao Tang, Jian Liu, Ruoxi Jia

TL;DR

A diagnostic approach to GEO is introduced that asks why a document fails to be cited and intervenes accordingly, and develops a unified framework comprising the first taxonomy of citation failure modes spanning different stages of a citation pipeline.

Abstract

Generative Engine Optimization (GEO) aims to improve content visibility in AI-generated responses. However, existing methods measure contribution-how much a document influences a response-rather than citation, the mechanism that actually drives traffic back to creators. Also, these methods apply generic rewriting rules uniformly, failing to diagnose why individual document are not cited. This paper introduces a diagnostic approach to GEO that asks why a document fails to be cited and intervenes accordingly. We develop a unified framework comprising: (1) the first taxonomy of citation failure modes spanning different stages of a citation pipeline; (2) AgentGEO, an agentic system that diagnoses failures using this taxonomy, selects targeted repairs from a corresponding tool library, and iterates until citation is achieved; and (3) a document-centric benchmark evaluating whether optimizations generalize across held-out queries. AgentGEO achieves over 40% relative improvement in citation rates while modifying only 5% of content, compared to 25% for baselines. Our analysis reveals that generic optimization can harm long-tail content and some documents face challenges that optimization alone cannot fully address-findings with implications for equitable visibility in AI-mediated information access.

Diagnosing and Repairing Citation Failures in Generative Engine Optimization

TL;DR

A diagnostic approach to GEO is introduced that asks why a document fails to be cited and intervenes accordingly, and develops a unified framework comprising the first taxonomy of citation failure modes spanning different stages of a citation pipeline.

Abstract

Generative Engine Optimization (GEO) aims to improve content visibility in AI-generated responses. However, existing methods measure contribution-how much a document influences a response-rather than citation, the mechanism that actually drives traffic back to creators. Also, these methods apply generic rewriting rules uniformly, failing to diagnose why individual document are not cited. This paper introduces a diagnostic approach to GEO that asks why a document fails to be cited and intervenes accordingly. We develop a unified framework comprising: (1) the first taxonomy of citation failure modes spanning different stages of a citation pipeline; (2) AgentGEO, an agentic system that diagnoses failures using this taxonomy, selects targeted repairs from a corresponding tool library, and iterates until citation is achieved; and (3) a document-centric benchmark evaluating whether optimizations generalize across held-out queries. AgentGEO achieves over 40% relative improvement in citation rates while modifying only 5% of content, compared to 25% for baselines. Our analysis reveals that generic optimization can harm long-tail content and some documents face challenges that optimization alone cannot fully address-findings with implications for equitable visibility in AI-mediated information access.
Paper Structure (74 sections, 1 equation, 6 figures, 7 tables, 1 algorithm)

This paper contains 74 sections, 1 equation, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison between traditional search engines and LLM-powered generative engines. Traditional search engines return a ranked list of hyperlinks based on relevance to the user query, while generative engines synthesize answers using retrieved content and provide citations to source webpages.
  • Figure 2: A taxonomy of citation failure modes in generative engines, spanning html fetching, prasing and answer generation stages.
  • Figure 3: Optimization process of $\mathtt{AgentGEO}$.$\mathtt{AgentGEO}$ starts by verifying citation status against competitor documents. For uncited documents, it triggers a "Diagnose-then-Repair" loop where: (1) an LLM first diagnoses the reason the document was not cited; (2) based on the diagnosis and a query-specific memory of past iterations, a specific tool is selected to modify a surrogate webpage (a copy of the original); and (3) the modified surrogate is retested against the same competitors. The loop repeats until citation is achieved or a limit is reached. All successful modifications are merged via an aggregation strategy into a single suggestion to optimize the original webpage. This step is then iteratively looped over additional training queries to further optimize the page.
  • Figure 4: Citation rate improvement over vanilla baseline across different topics. The numbers in left indicate the original citation rate for each topic.
  • Figure 5: Distribution of topics and content lengths in the MIMIQ benchmark.
  • ...and 1 more figures