Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, Luoyi Fu
TL;DR
Large language models frequently produce hallucinations that are untruthful or nonsensical, and existing detectors rely on external knowledge bases or multiple sampled responses. The authors propose a reference-free, uncertainty-based detector that mimics human factuality checking through three focus areas: keyword-focused uncertainty, propagation of uncertainty via attention, and token-property conditioning using entity-type signals. The method computes token- and sentence-level scores with formulas like $h_i = -\\log p_i(t_i) + \\mathcal{H}_i$, $\\hat{h}_i$, and $\\hat{p}(t)$, demonstrating effective detection on WikiBio GPT-3 and other datasets without external data. Experiments reveal state-of-the-art performance across diverse proxy models and show effectiveness for small-model outputs, offering a practical, scalable approach to improving LLM factuality in real-world applications.
Abstract
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. However, LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations in many real-world applications. Existing works for detecting hallucinations in LLMs either rely on external knowledge for reference retrieval or require sampling multiple responses from the LLM for consistency verification, making these methods costly and inefficient. In this paper, we propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs. Our approach imitates human focus in factuality checking from three aspects: 1) focus on the most informative and important keywords in the given text; 2) focus on the unreliable tokens in historical context which may lead to a cascade of hallucinations; and 3) focus on the token properties such as token type and token frequency. Experimental results on relevant datasets demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance across all the evaluation metrics and eliminates the need for additional information.
