Table of Contents
Fetching ...

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, Luoyi Fu

TL;DR

Large language models frequently produce hallucinations that are untruthful or nonsensical, and existing detectors rely on external knowledge bases or multiple sampled responses. The authors propose a reference-free, uncertainty-based detector that mimics human factuality checking through three focus areas: keyword-focused uncertainty, propagation of uncertainty via attention, and token-property conditioning using entity-type signals. The method computes token- and sentence-level scores with formulas like $h_i = -\\log p_i(t_i) + \\mathcal{H}_i$, $\\hat{h}_i$, and $\\hat{p}(t)$, demonstrating effective detection on WikiBio GPT-3 and other datasets without external data. Experiments reveal state-of-the-art performance across diverse proxy models and show effectiveness for small-model outputs, offering a practical, scalable approach to improving LLM factuality in real-world applications.

Abstract

Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. However, LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations in many real-world applications. Existing works for detecting hallucinations in LLMs either rely on external knowledge for reference retrieval or require sampling multiple responses from the LLM for consistency verification, making these methods costly and inefficient. In this paper, we propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs. Our approach imitates human focus in factuality checking from three aspects: 1) focus on the most informative and important keywords in the given text; 2) focus on the unreliable tokens in historical context which may lead to a cascade of hallucinations; and 3) focus on the token properties such as token type and token frequency. Experimental results on relevant datasets demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance across all the evaluation metrics and eliminates the need for additional information.

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

TL;DR

Large language models frequently produce hallucinations that are untruthful or nonsensical, and existing detectors rely on external knowledge bases or multiple sampled responses. The authors propose a reference-free, uncertainty-based detector that mimics human factuality checking through three focus areas: keyword-focused uncertainty, propagation of uncertainty via attention, and token-property conditioning using entity-type signals. The method computes token- and sentence-level scores with formulas like , , and , demonstrating effective detection on WikiBio GPT-3 and other datasets without external data. Experiments reveal state-of-the-art performance across diverse proxy models and show effectiveness for small-model outputs, offering a practical, scalable approach to improving LLM factuality in real-world applications.

Abstract

Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. However, LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations in many real-world applications. Existing works for detecting hallucinations in LLMs either rely on external knowledge for reference retrieval or require sampling multiple responses from the LLM for consistency verification, making these methods costly and inefficient. In this paper, we propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs. Our approach imitates human focus in factuality checking from three aspects: 1) focus on the most informative and important keywords in the given text; 2) focus on the unreliable tokens in historical context which may lead to a cascade of hallucinations; and 3) focus on the token properties such as token type and token frequency. Experimental results on relevant datasets demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance across all the evaluation metrics and eliminates the need for additional information.
Paper Structure (25 sections, 8 equations, 11 figures, 29 tables)

This paper contains 25 sections, 8 equations, 11 figures, 29 tables.

Figures (11)

  • Figure 1: (a) Using a naive proxy model can hinder the focus on hallucination itself: 1) considering all tokens within the given text may introduce noise; 2) the hallucinated tokens might be assigned high probabilities (green bar) due to the overconfidence problem; 3) factual tokens may receive low probabilities (red bar) due to the underconfidence problem. (b) To strengthen such focus, we imitate how humans perform factuality checking from three aspects: 1) focus on the informative keywords; 2) focus on the preceding words by propagating the uncertainty through attention weights; 3) focus on the token properties by providing entity type before each named entity.
  • Figure 2: The attention heat map after max-pooling for all the layers and attention heads when generating the example using llama-30b, where the x-axis only presents the first and last sentence, while the y-axis only includes the last sentence due to space constraints. The brightness of each rectangle represents the attention score between the corresponding tokens, with brighter shades indicating higher scores.
  • Figure 3: An example of providing entity type preceding named entities: Top-3 words that follow the incomplete sentence are all related to dates. Despite having the highest probability in Figure \ref{['overall']}a, the token "West" is generated with a relatively low probability of 0.03.
  • Figure 4: The attention heat map corresponding to the first case in Section \ref{['sec:case study penalty']}. Due to space limitations, not all sentences are depicted in the figure.
  • Figure 5: The attention heat map corresponding to the second case in Section \ref{['sec:case study penalty']}. Due to space limitations, not all sentences are depicted in the figure.
  • ...and 6 more figures