Table of Contents
Fetching ...

Geometric Uncertainty for Detecting and Correcting Hallucinations in LLMs

Edward Phillips, Sean Wu, Soheila Molaei, Danielle Belgrave, Anshul Thakur, David Clifton

TL;DR

The paper tackles hallucinations in large language models by proposing a black-box uncertainty framework grounded in geometry. It introduces Geometric Volume, a global uncertainty score derived from the convex hull volume of archetypes, and Geometric Suspicion, a local score to rank individual responses within a batch for Best-of-N correction. The approach unifies global and local uncertainty without internal model access and provides a theoretical link between convex-hull volume and entropy. Empirical results across medical, scientific, and QA benchmarks demonstrate competitive/global improvements and notable reductions in hallucination rates, highlighting practical impact in high-stakes domains.

Abstract

Large language models demonstrate impressive results across diverse tasks but are still known to hallucinate, generating linguistically plausible but incorrect answers to questions. Uncertainty quantification has been proposed as a strategy for hallucination detection, requiring estimates for both global uncertainty (attributed to a batch of responses) and local uncertainty (attributed to individual responses). While recent black-box approaches have shown some success, they often rely on disjoint heuristics or graph-theoretic approximations that lack a unified geometric interpretation. We introduce a geometric framework to address this, based on archetypal analysis of batches of responses sampled with only black-box model access. At the global level, we propose Geometric Volume, which measures the convex hull volume of archetypes derived from response embeddings. At the local level, we propose Geometric Suspicion, which leverages the spatial relationship between responses and these archetypes to rank reliability, enabling hallucination reduction through preferential response selection. Unlike prior methods that rely on discrete pairwise comparisons, our approach provides continuous semantic boundary points which have utility for attributing reliability to individual responses. Experiments show that our framework performs comparably to or better than prior methods on short form question-answering datasets, and achieves superior results on medical datasets where hallucinations carry particularly critical risks. We also provide theoretical justification by proving a link between convex hull volume and entropy.

Geometric Uncertainty for Detecting and Correcting Hallucinations in LLMs

TL;DR

The paper tackles hallucinations in large language models by proposing a black-box uncertainty framework grounded in geometry. It introduces Geometric Volume, a global uncertainty score derived from the convex hull volume of archetypes, and Geometric Suspicion, a local score to rank individual responses within a batch for Best-of-N correction. The approach unifies global and local uncertainty without internal model access and provides a theoretical link between convex-hull volume and entropy. Empirical results across medical, scientific, and QA benchmarks demonstrate competitive/global improvements and notable reductions in hallucination rates, highlighting practical impact in high-stakes domains.

Abstract

Large language models demonstrate impressive results across diverse tasks but are still known to hallucinate, generating linguistically plausible but incorrect answers to questions. Uncertainty quantification has been proposed as a strategy for hallucination detection, requiring estimates for both global uncertainty (attributed to a batch of responses) and local uncertainty (attributed to individual responses). While recent black-box approaches have shown some success, they often rely on disjoint heuristics or graph-theoretic approximations that lack a unified geometric interpretation. We introduce a geometric framework to address this, based on archetypal analysis of batches of responses sampled with only black-box model access. At the global level, we propose Geometric Volume, which measures the convex hull volume of archetypes derived from response embeddings. At the local level, we propose Geometric Suspicion, which leverages the spatial relationship between responses and these archetypes to rank reliability, enabling hallucination reduction through preferential response selection. Unlike prior methods that rely on discrete pairwise comparisons, our approach provides continuous semantic boundary points which have utility for attributing reliability to individual responses. Experiments show that our framework performs comparably to or better than prior methods on short form question-answering datasets, and achieves superior results on medical datasets where hallucinations carry particularly critical risks. We also provide theoretical justification by proving a link between convex hull volume and entropy.

Paper Structure

This paper contains 50 sections, 2 theorems, 17 equations, 5 figures, 9 tables.

Key Result

Theorem 1

Let $\mathcal{A} = \{ \mathbf{a}_1, \dots, \mathbf{a}_k \} \subset \mathbb{R}^d$ be a set of $k$ affinely independent archetypes. Let $\Delta = \operatorname{conv}(\mathcal{A})$ denote the $(k{-}1)$-dimensional simplex they span, with intrinsic volume $V > 0$ measured using the $(k{-}1)$-dimensional and the upper bound is achieved if and only if $\mathbf{x}$ is uniformly distributed over $\Delta$.

Figures (5)

  • Figure 1: A schematic of geometric volume: (1) sample $n$ responses from the LLM, (2) embed and apply dimensionality reduction, (3) perform archetypal analysis and compute the convex hull, and (4) apply a threshold to detect hallucination.
  • Figure 2: We demonstrate local uncertainty metrics using Archetype Rarity, Average Distance to Nearest Neighbors, and Distance from Consensus, which can transform hallucinated responses at temperature 0 into correct responses.
  • Figure 3: UMAP plots for cases where we were able to correct hallucinations with our local uncertainty metric, using the TriviaQA dataset and GPT-4o-mini model. In each plot, the red X shows $r_{\text{default}}$, which was determined to be a hallucination by a judge LLM. The other points show the batch of $n=20$ answers sampled from the same model with a temperature of one. The blue cross shows the answer selected by our framework (i.e. with lowest suspicion), which was determined by the same judge LLM to be a non-hallucination.
  • Figure 4: Distribution across all models and datasets of $T=1$ hallucination rate when the $T=0$ answer is hallucination.
  • Figure 5: Geometric Suspicion Utility vs. Corrective Potential. The reduction in hallucination rate ($\Delta H$) is plotted against the median proportion of correct answers found in the sampled batch for instances where the default answer was incorrect. The dashed curve represents a second-order polynomial regression fit highlighting the general scaling trend, while the shaded region indicates the 95% confidence interval. Performance naturally drops in 'confidently wrong' regimes where the sampled batch lacks correct alternatives, but improves as the corrective potential increases.

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Corollary 1
  • proof