Table of Contents
Fetching ...

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Lexiang Xiong, Qi Li, Jingwen Ye, Xinchao Wang

Abstract

Vision-Language Models (VLMs) frequently "hallucinate" - generate plausible yet factually incorrect statements - posing a critical barrier to their trustworthy deployment. In this work, we propose a new paradigm for diagnosing hallucinations, recasting them from static output errors into dynamic pathologies of a model's computational cognition. Our framework is grounded in a normative principle of computational rationality, allowing us to model a VLM's generation as a dynamic cognitive trajectory. We design a suite of information-theoretic probes that project this trajectory onto an interpretable, low-dimensional Cognitive State Space. Our central discovery is a governing principle we term the geometric-information duality: a cognitive trajectory's geometric abnormality within this space is fundamentally equivalent to its high information-theoretic surprisal. Hallucination detection is counts as a geometric anomaly detection problem. Evaluated across diverse settings - from rigorous binary QA (POPE) and comprehensive reasoning (MME) to unconstrained open-ended captioning (MS-COCO) - our framework achieves state-of-the-art performance. Crucially, it operates with high efficiency under weak supervision and remains highly robust even when calibration data is heavily contaminated. This approach enables a causal attribution of failures, mapping observable errors to distinct pathological states: perceptual instability (measured by Perceptual Entropy), logical-causal failure (measured by Inferential Conflict), and decisional ambiguity (measured by Decision Entropy). Ultimately, this opens a path toward building AI systems whose reasoning is transparent, auditable, and diagnosable by design.

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Abstract

Vision-Language Models (VLMs) frequently "hallucinate" - generate plausible yet factually incorrect statements - posing a critical barrier to their trustworthy deployment. In this work, we propose a new paradigm for diagnosing hallucinations, recasting them from static output errors into dynamic pathologies of a model's computational cognition. Our framework is grounded in a normative principle of computational rationality, allowing us to model a VLM's generation as a dynamic cognitive trajectory. We design a suite of information-theoretic probes that project this trajectory onto an interpretable, low-dimensional Cognitive State Space. Our central discovery is a governing principle we term the geometric-information duality: a cognitive trajectory's geometric abnormality within this space is fundamentally equivalent to its high information-theoretic surprisal. Hallucination detection is counts as a geometric anomaly detection problem. Evaluated across diverse settings - from rigorous binary QA (POPE) and comprehensive reasoning (MME) to unconstrained open-ended captioning (MS-COCO) - our framework achieves state-of-the-art performance. Crucially, it operates with high efficiency under weak supervision and remains highly robust even when calibration data is heavily contaminated. This approach enables a causal attribution of failures, mapping observable errors to distinct pathological states: perceptual instability (measured by Perceptual Entropy), logical-causal failure (measured by Inferential Conflict), and decisional ambiguity (measured by Decision Entropy). Ultimately, this opens a path toward building AI systems whose reasoning is transparent, auditable, and diagnosable by design.
Paper Structure (40 sections, 4 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 40 sections, 4 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: An example of computational cognitive dissonance in Idefics2, where a cascade of failures leads to a coincidentally correct answer. (1) Perceptual Failure: The model hallucinates a 'motorcycle' in the evidence chain, an object not present in the image (a cyclist is visible). Our framework captures this as high Perceptual Instability (see panel (c)). (2) Logical Failure: The model then contradicts its own faulty evidence, concluding the final answer is 'No'. This breakdown of self-consistency is diagnosed as extremely high Inferential Conflict (see panel (d)). This case study demonstrates the limitation of accuracy-only evaluations and highlights our framework's ability to perform a stage-by-stage differential diagnosis of a VLM's cognitive process, identifying complex, multi-stage failure trajectories.
  • Figure 2: ROC curves of our Cognitive Anomaly Detection (CAD) framework. (a) Linear-scale ROC curves show superior overall performance across all architectures. (b) log-log curves highlight CAD's dominance in the critical low-FPR regime ($\text{FPR} < 10^{-2}$), which is essential for reliable real-world deployment.
  • Figure 3: Visualizing the 'Cognitive Fingerprints' of Hallucination. Density projections of the 3D Cognitive State Space, separated into non-hallucinatory (blue, top row of each pair) and hallucinatory (red, bottom row of each pair) processes. These manifolds reveal unique failure signatures for each model.
  • Figure 4: Ablation Study. (a) Standalone metrics vs. synergistic gain. The gray hatched area represents the Synergy Gain. (b) Impact of component removal ($\Delta$AUC), revealing model-specific 'diagnostic fingerprints'.
  • Figure 5: Generalization of Perceptual Instability ($H_{\text{Evi}}$) to Open-Ended Captioning (MS-COCO, $N=1000$). Without any task-specific tuning, our perceptual probe consistently assigns significantly higher entropy to hallucinatory captions (red) compared to factual ones (blue) across all four architectures. Statistical Significance: The distinction is profound, with Welch's t-test yielding $p \ll 0.001$ in all cases, validating that $H_{\text{Evi}}$ captures a fundamental cognitive signature of hallucination beyond VQA formats. Cross-Model Insight: The shared Y-axis highlights that models like Qwen2-VL and DeepSeek-VL2 (bottom row) exhibit higher baseline entropy in their factual generations compared to Llava and Idefics2.
  • ...and 3 more figures