Table of Contents
Fetching ...

Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations

Kalyan Cherukuri, Lav R. Varshney

Abstract

Large language models (LLMs) hallucinate: they produce fluent outputs that are factually incorrect. We present a geometric dynamical systems framework in which hallucinations arise from task-dependent basin structure in latent space. Using autoregressive hidden-state trajectories across multiple open-source models and benchmarks, we find that separability is strongly task-dependent rather than universal: factoid settings can show clearer basin separation, whereas summarization and misconception-heavy settings are typically less stable and often overlap. We formalize this behavior with task-complexity and multi-basin theorems, characterize basin emergence in L-layer transformers, and show that geometry-aware steering can reduce hallucination probability without retraining.

Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations

Abstract

Large language models (LLMs) hallucinate: they produce fluent outputs that are factually incorrect. We present a geometric dynamical systems framework in which hallucinations arise from task-dependent basin structure in latent space. Using autoregressive hidden-state trajectories across multiple open-source models and benchmarks, we find that separability is strongly task-dependent rather than universal: factoid settings can show clearer basin separation, whereas summarization and misconception-heavy settings are typically less stable and often overlap. We formalize this behavior with task-complexity and multi-basin theorems, characterize basin emergence in L-layer transformers, and show that geometry-aware steering can reduce hallucination probability without retraining.

Paper Structure

This paper contains 60 sections, 13 theorems, 66 equations, 17 figures, 5 tables, 2 algorithms.

Key Result

Proposition 5.1

If attention over some uninformative contexts as described above, $x \sim \mathcal{C}$, concentrates uniformly then $\mathbb{E}_{x\sim\mathcal{C}}[\text{Attn}^{(\ell)}(h^{(\ell-1)}(x))] \approx 0$ and thus: where $\sigma_{\mathcal{C}}$ is the variance of $h^{(\ell)}(x)$ over $x \sim \mathcal{C}$. Additionally, if the Jacobian, $J_\ell(\mu^{(\ell)})$ has spectral radius $\rho(J_\ell(\mu^{(\ell)}))

Figures (17)

  • Figure 1: Task-Dependent Basin Geometry. Llama-3.2-3b's performance on various tasks and 3D PCA projected outputs. (a) shows performance on MuSiQue, (b) shows performance on HaluEvalQA, (c) shows performance on HaluEvalSummarization, (d) shows performance on TruthfulQA.
  • Figure 2: Causal Intervention: Factual $\to$ Basin. (Left) Dose-response curve fold increase in hallucination probability as factual hidden states are in-model steered toward the hallucination centroid (interpolation strength $\alpha$ on the horizontal axis). Right: bar plot comparing the maximum fold increase produced by steering along the basin direction versus two controls (random direction and an orthogonal direction). See Appendix \ref{['app:intervention']} and \ref{['app:discussion']}.
  • Figure 3: Multi-basin Voronoi structure across models on TruthfulQA. Each panel shows distinct hallucination basins corresponding to different misconception modes.
  • Figure 4: Efficacy of Algorithm \ref{['alg:adaptive_steering']} in hallucination reduction as a function of the steering strength $\lambda$.
  • Figure 5: Irreversibility summary under autoregressive decoding (HaluEval QA, Llama-3.2-1B, best layer). We report basin-entry, conditional irreversibility, escape-after-entry, and factual entry rates. This verifies Theorem \ref{['thm:traj_trap']}.
  • ...and 12 more figures

Theorems & Definitions (37)

  • Definition 3.1: Answer Cardinality
  • Proposition 5.1: Reference states as fixed points
  • proof
  • Definition 5.2: Reference region
  • Definition 5.3: Hallucination basin
  • Definition 5.4: Radial distance
  • Definition 5.5: Radial contraction
  • Definition 5.6: Subspace radial contraction
  • Proposition 5.7: Manifold attractor
  • proof
  • ...and 27 more