Hallucination Detection via Internal States and Structured Reasoning Consistency in Large Language Models
Yusheng Song, Lirong Qiu, Xi Zhang, Zhihao Tang
TL;DR
This work tackles the Detection Dilemma in hallucination detection by unifying sub-symbolic Internal State Probing with symbolic Chain-of-Thought Verification. It introduces a multi-path signaling strategy (Direct Answer, Reasoning-Augmented CoT, and Reverse-Inference paths) and a segment-aware temporalized cross-attention module that aligns and fuses heterogeneous representations. The approach achieves state-of-the-art AUROC across fact- and logic-intensive benchmarks on two large language models, validating its generalizability and robustness. By enabling coherent cross-paradigm verification, the framework offers a practical path toward more trustworthy LLM deployments in high-stakes settings.
Abstract
The detection of sophisticated hallucinations in Large Language Models (LLMs) is hampered by a ``Detection Dilemma'': methods probing internal states (Internal State Probing) excel at identifying factual inconsistencies but fail on logical fallacies, while those verifying externalized reasoning (Chain-of-Thought Verification) show the opposite behavior. This schism creates a task-dependent blind spot: Chain-of-Thought Verification fails on fact-intensive tasks like open-domain QA where reasoning is ungrounded, while Internal State Probing is ineffective on logic-intensive tasks like mathematical reasoning where models are confidently wrong. We resolve this with a unified framework that bridges this critical gap. However, unification is hindered by two fundamental challenges: the Signal Scarcity Barrier, as coarse symbolic reasoning chains lack signals directly comparable to fine-grained internal states, and the Representational Alignment Barrier, a deep-seated mismatch between their underlying semantic spaces. To overcome these, we introduce a multi-path reasoning mechanism to obtain more comparable, fine-grained signals, and a segment-aware temporalized cross-attention module to adaptively fuse these now-aligned representations, pinpointing subtle dissonances. Extensive experiments on three diverse benchmarks and two leading LLMs demonstrate that our framework consistently and significantly outperforms strong baselines. Our code is available: https://github.com/peach918/HalluDet.
