Table of Contents
Fetching ...

Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025

Samar Ansari

TL;DR

This paper analyzes 100 AI-generated hallucinated citations that appeared in NeurIPS 2025 accepted papers, revealing that 66% were Total Fabrications and that every instance exhibited compound failure modes. The authors develop a five-category taxonomy (TF, PAC, IH, PH, SH) and show that most fabrications layer semantic plausibility and false verifiability, enabling them to evade detection during standard peer review. They document a systemic reliability problem in elite peer review and argue for mandatory automated citation verification at submission, including multi-attribute cross-checks (existence, metadata consistency, identifier validation, and semantic plausibility). The findings have broad implications for conferences, government reports, and consulting outputs, underscoring an urgent need to strengthen verification infrastructure as AI writing tools scale. The work provides a clear path forward with implementable solutions and highlights areas for future research on contamination inheritance and multi-dimensional deception patterns.

Abstract

Large language models (LLMs) are increasingly used in academic writing workflows, yet they frequently hallucinate by generating citations to sources that do not exist. This study analyzes 100 AI-generated hallucinated citations that appeared in papers accepted by the 2025 Conference on Neural Information Processing Systems (NeurIPS), one of the world's most prestigious AI conferences. Despite review by 3-5 expert researchers per paper, these fabricated citations evaded detection, appearing in 53 published papers (approx. 1% of all accepted papers). We develop a five-category taxonomy that classifies hallucinations by their failure mode: Total Fabrication (66%), Partial Attribute Corruption (27%), Identifier Hijacking (4%), Placeholder Hallucination (2%), and Semantic Hallucination (1%). Our analysis reveals a critical finding: every hallucination (100%) exhibited compound failure modes. The distribution of secondary characteristics was dominated by Semantic Hallucination (63%) and Identifier Hijacking (29%), which often appeared alongside Total Fabrication to create a veneer of plausibility and false verifiability. These compound structures exploit multiple verification heuristics simultaneously, explaining why peer review fails to detect them. The distribution exhibits a bimodal pattern: 92% of contaminated papers contain 1-2 hallucinations (minimal AI use) while 8% contain 4-13 hallucinations (heavy reliance). These findings demonstrate that current peer review processes do not include effective citation verification and that the problem extends beyond NeurIPS to other major conferences, government reports, and professional consulting. We propose mandatory automated citation verification at submission as an implementable solution to prevent fabricated citations from becoming normalized in scientific literature.

Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025

TL;DR

This paper analyzes 100 AI-generated hallucinated citations that appeared in NeurIPS 2025 accepted papers, revealing that 66% were Total Fabrications and that every instance exhibited compound failure modes. The authors develop a five-category taxonomy (TF, PAC, IH, PH, SH) and show that most fabrications layer semantic plausibility and false verifiability, enabling them to evade detection during standard peer review. They document a systemic reliability problem in elite peer review and argue for mandatory automated citation verification at submission, including multi-attribute cross-checks (existence, metadata consistency, identifier validation, and semantic plausibility). The findings have broad implications for conferences, government reports, and consulting outputs, underscoring an urgent need to strengthen verification infrastructure as AI writing tools scale. The work provides a clear path forward with implementable solutions and highlights areas for future research on contamination inheritance and multi-dimensional deception patterns.

Abstract

Large language models (LLMs) are increasingly used in academic writing workflows, yet they frequently hallucinate by generating citations to sources that do not exist. This study analyzes 100 AI-generated hallucinated citations that appeared in papers accepted by the 2025 Conference on Neural Information Processing Systems (NeurIPS), one of the world's most prestigious AI conferences. Despite review by 3-5 expert researchers per paper, these fabricated citations evaded detection, appearing in 53 published papers (approx. 1% of all accepted papers). We develop a five-category taxonomy that classifies hallucinations by their failure mode: Total Fabrication (66%), Partial Attribute Corruption (27%), Identifier Hijacking (4%), Placeholder Hallucination (2%), and Semantic Hallucination (1%). Our analysis reveals a critical finding: every hallucination (100%) exhibited compound failure modes. The distribution of secondary characteristics was dominated by Semantic Hallucination (63%) and Identifier Hijacking (29%), which often appeared alongside Total Fabrication to create a veneer of plausibility and false verifiability. These compound structures exploit multiple verification heuristics simultaneously, explaining why peer review fails to detect them. The distribution exhibits a bimodal pattern: 92% of contaminated papers contain 1-2 hallucinations (minimal AI use) while 8% contain 4-13 hallucinations (heavy reliance). These findings demonstrate that current peer review processes do not include effective citation verification and that the problem extends beyond NeurIPS to other major conferences, government reports, and professional consulting. We propose mandatory automated citation verification at submission as an implementable solution to prevent fabricated citations from becoming normalized in scientific literature.
Paper Structure (28 sections, 3 figures, 3 tables)

This paper contains 28 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Distribution of 100 hallucinated citations by primary failure mode. Total Fabrication dominates at 66%, with Partial Attribute Corruption accounting for most remaining cases (27%).
  • Figure 2: Distribution of hallucination counts across 53 contaminated papers. The bimodal pattern shows 49 papers (92%) with minimal AI use (1--2 citations), 3 papers (6%) with heavy reliance (4--6 citations), and 1 outlier (2%) with extensive reliance (13 citations). Median = 2, Mean = 1.89, Range: 1--13.
  • Figure 3: Compound failure structure of AI-generated hallucinations. The Venn diagram shows overlap between primary failure modes (outer circles) and secondary characteristics (inner regions). The dominant pattern (TF primary + SH secondary, 63% of all citations) combines wholesale fabrication with semantic plausibility, exploiting the "sounds right" heuristic while providing zero verifiable content. IH secondary (29%) creates false verifiability by providing working links to unrelated papers.