The Phenomenology of Hallucinations

Valeria Ruscio; Keiran Thompson

The Phenomenology of Hallucinations

Valeria Ruscio, Keiran Thompson

Abstract

We show that language models hallucinate not because they fail to detect uncertainty, but because of a failure to integrate it into output generation. Across architectures, uncertain inputs are reliably identified, occupying high-dimensional regions with 2-3$\times$ the intrinsic dimensionality of factual inputs. However, this internal signal is weakly coupled to the output layer: uncertainty migrates into low-sensitivity subspaces, becoming geometrically amplified yet functionally silent. Topological analysis shows that uncertainty representations fragment rather than converging to a unified abstention state, while gradient and Fisher probes reveal collapsing sensitivity along the uncertainty direction. Because cross-entropy training provides no attractor for abstention and uniformly rewards confident prediction, associative mechanisms amplify these fractured activations until residual coupling forces a committed output despite internal detection. Causal interventions confirm this account by restoring refusal when uncertainty is directly connected to logits.

The Phenomenology of Hallucinations

Abstract

the intrinsic dimensionality of factual inputs. However, this internal signal is weakly coupled to the output layer: uncertainty migrates into low-sensitivity subspaces, becoming geometrically amplified yet functionally silent. Topological analysis shows that uncertainty representations fragment rather than converging to a unified abstention state, while gradient and Fisher probes reveal collapsing sensitivity along the uncertainty direction. Because cross-entropy training provides no attractor for abstention and uniformly rewards confident prediction, associative mechanisms amplify these fractured activations until residual coupling forces a committed output despite internal detection. Causal interventions confirm this account by restoring refusal when uncertainty is directly connected to logits.

Paper Structure (36 sections, 9 equations, 5 figures, 10 tables)

This paper contains 36 sections, 9 equations, 5 figures, 10 tables.

Introduction
Related Works
Methodology
Analysis
Random-direction controls
Optimization pressure and uncertainty collapse
Conclusion
Acknowledgments
Appendix
Component-Level Mechanisms
Methodology
Analysis
Developmental Trajectory
Additional Tables and Plots
Possible Interventions
...and 21 more sections

Figures (5)

Figure 1: Hessian curvature of Factual vs Hallucination inputs across multiple architectures.
Figure 2: LID as a function of depth for different kinds of input across multiple architectures.
Figure 3: Topological complexity ($\beta_0$) evolution by depth across multiple architectures
Figure 4: Attention Entropy over different architectures.
Figure 5: Examples of intermediate steps in generating images from a prompt that leads to hallucinations.

The Phenomenology of Hallucinations

Abstract

The Phenomenology of Hallucinations

Authors

Abstract

Table of Contents

Figures (5)