Table of Contents
Fetching ...

A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource Hallucination Detection in Large Language Models

Satoshi Munakata, Taku Fukui, Takao Mohri

TL;DR

This paper proposes a hallucination detection method that incorporates a multiple-fill-in-the-blank exam approach and achieves clearer state-of-the-art performance in the ensembles with existing methods.

Abstract

Large language models (LLMs) often fabricate a hallucinatory text. Several methods have been developed to detect such text by semantically comparing it with the multiple versions probabilistically regenerated. However, a significant issue is that if the storyline of each regenerated text changes, the generated texts become incomparable, which worsen detection accuracy. In this paper, we propose a hallucination detection method that incorporates a multiple-fill-in-the-blank exam approach to address this storyline-changing issue. First, our method creates a multiple-fill-in-the-blank exam by masking multiple objects from the original text. Second, prompts an LLM to repeatedly answer this exam. This approach ensures that the storylines of the exam answers align with the original ones. Finally, quantifies the degree of hallucination for each original sentence by scoring the exam answers, considering the potential for \emph{hallucination snowballing} within the original text itself. Experimental results show that our method alone not only outperforms existing methods, but also achieves clearer state-of-the-art performance in the ensembles with existing methods.

A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource Hallucination Detection in Large Language Models

TL;DR

This paper proposes a hallucination detection method that incorporates a multiple-fill-in-the-blank exam approach and achieves clearer state-of-the-art performance in the ensembles with existing methods.

Abstract

Large language models (LLMs) often fabricate a hallucinatory text. Several methods have been developed to detect such text by semantically comparing it with the multiple versions probabilistically regenerated. However, a significant issue is that if the storyline of each regenerated text changes, the generated texts become incomparable, which worsen detection accuracy. In this paper, we propose a hallucination detection method that incorporates a multiple-fill-in-the-blank exam approach to address this storyline-changing issue. First, our method creates a multiple-fill-in-the-blank exam by masking multiple objects from the original text. Second, prompts an LLM to repeatedly answer this exam. This approach ensures that the storylines of the exam answers align with the original ones. Finally, quantifies the degree of hallucination for each original sentence by scoring the exam answers, considering the potential for \emph{hallucination snowballing} within the original text itself. Experimental results show that our method alone not only outperforms existing methods, but also achieves clearer state-of-the-art performance in the ensembles with existing methods.
Paper Structure (34 sections, 10 figures, 1 table)

This paper contains 34 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Examples of the storyline-changing issue. Each text is generated with the original prompt. Each sentence is assigned a serial number, such as [s1]. Red bold indicates hallucinatory phrases. Yellow background indicates non-hallucinatory but incomparable phrases due to the regenerated texts with topic picking and snowballing.
  • Figure 2: An example of our FIBE approach with the original text in Figure \ref{['fig:storyline-changing']}. This exemplifies the steps to predict the hallucination score for each sentence in the original text. Bold underline in the exam answers indicates comparable phrases that were regenerated according to our expectations and that correspond to the hallucinatory or incomparable phrases in Figure \ref{['fig:storyline-changing']}.
  • Figure 3: Number of indicators that outperform SCGP* (resampled) when the 5 indicators in Table \ref{['tab:benchmark_result']} are evaluated using only the first to $x$-th line of each text.
  • Figure 4: PR Curve - NonFact task
  • Figure 5: PR Curve - NonFact* task
  • ...and 5 more figures