From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

Litian Liu; Reza Pourreza; Yubing Jian; Yao Qin; Roland Memisevic

From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

Litian Liu, Reza Pourreza, Yubing Jian, Yao Qin, Roland Memisevic

TL;DR

This work revisits hallucination detection through the lens of out-of-distribution (OOD) detection, and suggests that reframing hallucination detection as OOD detection provides a promising and scalable pathway toward language model safety.

Abstract

Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve strong performance in question-answering tasks, they remain less effective on tasks requiring reasoning. In this work, we revisit hallucination detection through the lens of out-of-distribution (OOD) detection, a well-studied problem in areas like computer vision. Treating next-token prediction in language models as a classification task allows us to apply OOD techniques, provided appropriate modifications are made to account for the structural differences in large language models. We show that OOD-based approaches yield training-free, single-sample-based detectors, achieving strong accuracy in hallucination detection for reasoning tasks. Overall, our work suggests that reframing hallucination detection as OOD detection provides a promising and scalable pathway toward language model safety.

From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

TL;DR

Abstract

Paper Structure (33 sections, 2 theorems, 26 equations, 1 figure, 7 tables)

This paper contains 33 sections, 2 theorems, 26 equations, 1 figure, 7 tables.

Introduction
Problem Statement
Geometry of Token Generation under a Classification View
Classification View
Feature Proximity to Weight Vectors
Feature Distance to Decision Boundaries
Geometric Uncertainty Signals Hallucination
From OOD to Hallucination Detection
Case Study Setups
Challenge I: Estimating Training Statistics at Scale
Challenge II: Effectiveness and Efficiency in Massive Vocabulary Space
Challenge III: Robustness to Stochastic Generation
Experiments
Main Results
Datasets
...and 18 more sections

Key Result

Theorem 3.3

Adapted from liu2024fast. Given embedding $\bm{z}$ and token $c \in \mathcal{V}, c \neq \arg\max_{v \in \mathcal{V}} \bm{w}_v^\top \bm{z} + b_v$, $D_f(\bm{z}, c)$ is lower bounded by See proof in Appendix sec:appendix_proof.

Figures (1)

Figure 1: OOD-inspired geometric uncertainty measures can detect hallucinations. (a) Embeddings from hallucinated responses exhibit less proximity to weight vectors, extending OOD detector NCI liu2025detecting. (b) Embeddings from hallucinated responses exhibit smaller distance to decision boundaries than correct embeddings, extending OOD detector fDBD liu2024fast. (a) Left and (b) Left illustrate the proximity score and distance to the decision boundaries defined in Definition \ref{['def:pScore']} and Definition \ref{['def:uniDistanceLLM']}, respectively. (a) Right and (b) Right show histograms for the corresponding uncertainty measures based on the CSQA dataset on Llama-3.2-3B-Instruct.

Theorems & Definitions (6)

Definition 3.1: Feature Proximity to Weight Vectors
Definition 3.2: Distance to Decision Boundary
Theorem 3.3: Approximate Distance to Decision Boundary
Lemma 4.1: Analytical Solution for Decision-Neutral Closest Point
proof
Definition 4.1: Decision-Neutral Closest Point

From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

TL;DR

Abstract

From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (6)