Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

Weihang Su; Changyue Wang; Qingyao Ai; Yiran HU; Zhijing Wu; Yujia Zhou; Yiqun Liu

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

Weihang Su, Changyue Wang, Qingyao Ai, Yiran HU, Zhijing Wu, Yujia Zhou, Yiqun Liu

TL;DR

This work tackles the challenge of hallucinations in large language models by proposing MIND, an unsupervised framework that detects hallucinations in real time using internal model states during inference. It eliminates the need for manual annotations by automatically generating training data from Wikipedia continuations and training a lightweight MLP classifier on contextualized token embeddings. To enable robust evaluation, the authors introduce HELM, a multi-LLM benchmark that provides generated texts, internal states, and human-annotated hallucination labels. Empirical results show MIND outperforms state-of-the-art reference-free baselines while remaining highly efficient, underscoring its practicality for real-time deployment and enabling safer, more reliable LLM applications.

Abstract

Hallucinations in large language models (LLMs) refer to the phenomenon of LLMs producing responses that are coherent yet factually inaccurate. This issue undermines the effectiveness of LLMs in practical applications, necessitating research into detecting and mitigating hallucinations of LLMs. Previous studies have mainly concentrated on post-processing techniques for hallucination detection, which tend to be computationally intensive and limited in effectiveness due to their separation from the LLM's inference process. To overcome these limitations, we introduce MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection without requiring manual annotations. Additionally, we present HELM, a new benchmark for evaluating hallucination detection across multiple LLMs, featuring diverse LLM outputs and the internal states of LLMs during their inference process. Our experiments demonstrate that MIND outperforms existing state-of-the-art methods in hallucination detection.

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

TL;DR

Abstract

Paper Structure (35 sections, 2 equations, 2 figures, 6 tables)

This paper contains 35 sections, 2 equations, 2 figures, 6 tables.

Introduction
Problem Formulation
Methodology
Unsupervised Training Data Generation
Hallucination Classifier Training
Feature Selection
Training Process
Real-time Hallucination Detection
The HELM Benchmark
Data Generation
Human Annotation
Benchmark Analysis and Usage
Experimental Settings
Dataset and Metrics
Baselines
...and 20 more sections

Figures (2)

Figure 1: An illustration of the automatic training data generation process of our proposed framework: MIND.
Figure 2: An illustration of the hallucination classifier training process. "${H}_{j}^{i}$" represents the token embedding of the $i^{th}$ token in the $K^{th}$ Transformer layer.

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

TL;DR

Abstract

Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)