Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

Ahmed Karim; Fatima Sheaib; Zein Khamis; Maggie Chlon; Jad Awada; Leon Chlon

Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

Ahmed Karim, Fatima Sheaib, Zein Khamis, Maggie Chlon, Jad Awada, Leon Chlon

TL;DR

This work investigates procedural hallucinations in language models, where correct information is encoded but not used at readout. The authors formalize a two-stage readout framework (Stage 2A gating and Stage 2B binding) and quantify routing efficiency with information-theoretic measures, distinguishing available vs. used information. Empirically, Stage 2B errors dominate in hard long-context binding tasks, yet linear probes can recover the correct value on error trials, validating the 'present but not used' hypothesis. They introduce pseudo-priors and structure-preserving ablations to certify the information budget required to overcome biases, and demonstrate mitigation via activation patching and oracle checkpointing that restates bindings near the query to restore long-distance accuracy. An accompanying reproducibility toolkit provides diagnostics and protocols to apply these methods to API models, enabling practical auditing and mitigation of procedural hallucinations in real-world deployments.

Abstract

Large language models can follow complex procedures yet fail at a seemingly trivial final step: reporting a value they themselves computed moments earlier. We study this phenomenon as \emph{procedural hallucination}: failure to execute a verifiable, prompt-grounded specification even when the correct value is present in context. In long-context binding tasks with a known single-token candidate set, we find that many errors are readout-stage routing failures. Specifically, failures decompose into Stage~2A (gating) errors, where the model does not enter answer mode, and Stage~2B (binding) errors, where it enters answer mode but selects the wrong candidate (often due to recency bias). In the hard regime, Stage~2B accounts for most errors across model families in our tasks (Table~1). On Stage~2B error trials, a linear probe on the final-layer residual stream recovers the correct value far above chance (e.g., 74\% vs.\ 2\% on Qwen2.5-3B; Table~2), indicating that the answer is encoded but not used. We formalize ``present but not used'' via available vs.\ used mutual information and pseudo-prior interventions, yielding output-computable diagnostics and information-budget certificates. Finally, an oracle checkpointing intervention that restates the true binding near the query can nearly eliminate Stage~2B failures at long distance (e.g., Qwen2.5-3B $0/400 \rightarrow 399/400$ at $k = 1024$; Table~8).

Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

TL;DR

Abstract

; Table~8).

Paper Structure (79 sections, 10 theorems, 25 equations, 2 figures, 15 tables)

This paper contains 79 sections, 10 theorems, 25 equations, 2 figures, 15 tables.

Introduction
Overview of Contributions
Results at a glance (core claims).
Scope.
Contributions (with evidence pointers).
Artifact.
Related Work
Stagewise Slot Population
Prompt Structure
Stage 2A: Does the Model Enter Answer Mode?
Stage 2B: Does the Model Select the Right Candidate?
Why This Decomposition Matters
Information-Theoretic Framework
Available versus Used Information
From Error Rates to Information: Fano Bounds
...and 64 more sections

Key Result

Proposition 1

For any $k$, we have $0\le \eta_k \le 1$.

Figures (2)

Figure 1: Framework overview: Stage 2A gating and Stage 2B binding failures correspond to low routing efficiency $I_{\mathrm{used}}/I_{\mathrm{avail}}$, diagnosed via pseudo-prior interventions.
Figure 2: Spotlight summary of core empirical claims.(A) Stage decomposition for representative hard-regime settings: most errors are Stage 2B (binding) rather than Stage 2A (gating), i.e., the model enters answer mode but selects the wrong candidate. (B) Checkpointing (restating the true binding every 128 tokens near the query) substantially recovers long-distance binding for Qwen2.5-3B; on competing_vars at $k=1024$, it converts 0/400$\to$399/400 correct. Error bars are 95% Wilson binomial confidence intervals over $n=400$ trials per cell.

Theorems & Definitions (21)

Definition 1: Procedural hallucination
Definition 2: Available and used information
Proposition 1: Data processing
Theorem 1: Fano lower bound
Proposition 2: Minimax tightness
Proposition 3: Fano slack decomposition
Corollary 1: Fano inversion
Definition 3: Pseudo-prior
Theorem 2: Bernoulli-projected decompression bound
Corollary 2: Bits-to-trust
...and 11 more

Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

TL;DR

Abstract

Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (21)