Table of Contents
Fetching ...

The Phenomenology of Hallucinations

Valeria Ruscio, Keiran Thompson

Abstract

We show that language models hallucinate not because they fail to detect uncertainty, but because of a failure to integrate it into output generation. Across architectures, uncertain inputs are reliably identified, occupying high-dimensional regions with 2-3$\times$ the intrinsic dimensionality of factual inputs. However, this internal signal is weakly coupled to the output layer: uncertainty migrates into low-sensitivity subspaces, becoming geometrically amplified yet functionally silent. Topological analysis shows that uncertainty representations fragment rather than converging to a unified abstention state, while gradient and Fisher probes reveal collapsing sensitivity along the uncertainty direction. Because cross-entropy training provides no attractor for abstention and uniformly rewards confident prediction, associative mechanisms amplify these fractured activations until residual coupling forces a committed output despite internal detection. Causal interventions confirm this account by restoring refusal when uncertainty is directly connected to logits.

The Phenomenology of Hallucinations

Abstract

We show that language models hallucinate not because they fail to detect uncertainty, but because of a failure to integrate it into output generation. Across architectures, uncertain inputs are reliably identified, occupying high-dimensional regions with 2-3 the intrinsic dimensionality of factual inputs. However, this internal signal is weakly coupled to the output layer: uncertainty migrates into low-sensitivity subspaces, becoming geometrically amplified yet functionally silent. Topological analysis shows that uncertainty representations fragment rather than converging to a unified abstention state, while gradient and Fisher probes reveal collapsing sensitivity along the uncertainty direction. Because cross-entropy training provides no attractor for abstention and uniformly rewards confident prediction, associative mechanisms amplify these fractured activations until residual coupling forces a committed output despite internal detection. Causal interventions confirm this account by restoring refusal when uncertainty is directly connected to logits.
Paper Structure (36 sections, 9 equations, 5 figures, 10 tables)

This paper contains 36 sections, 9 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Hessian curvature of Factual vs Hallucination inputs across multiple architectures.
  • Figure 2: LID as a function of depth for different kinds of input across multiple architectures.
  • Figure 3: Topological complexity ($\beta_0$) evolution by depth across multiple architectures
  • Figure 4: Attention Entropy over different architectures.
  • Figure 5: Examples of intermediate steps in generating images from a prompt that leads to hallucinations.