Table of Contents
Fetching ...

Enhancing Hallucination Detection through Perturbation-Based Synthetic Data Generation in System Responses

Dongxu Zhang, Varun Gangal, Barrett Martin Lattimer, Yi Yang

TL;DR

This work tackles the costly challenge of training hallucination detectors for rapidly evolving LLMs by automatically generating both faithful and hallucinated outputs through a prompt-driven rewriting pipeline. By using a rewriting LLM (GPT-4) to perturb the target system's responses, the authors create a synthetic dataset that closely matches real-world LLM outputs and enables end-to-end training of a T5-base detector. Empirical results on OpenDialKG-Eval and BEGIN show the fine-tuned detector surpasses zero-shot methods and prior synthetic baselines in Macro-F1 and latency, while also reducing annotation costs. The approach yields richer hallucination coverage and demonstrates favorable data quality, with ablations confirming the necessity of including both faithful and hallucination data. Overall, the method offers a cost-effective, adaptable path to improving reliability and safety in LLM-based systems.

Abstract

Detecting hallucinations in large language model (LLM) outputs is pivotal, yet traditional fine-tuning for this classification task is impeded by the expensive and quickly outdated annotation process, especially across numerous vertical domains and in the face of rapid LLM advancements. In this study, we introduce an approach that automatically generates both faithful and hallucinated outputs by rewriting system responses. Experimental findings demonstrate that a T5-base model, fine-tuned on our generated dataset, surpasses state-of-the-art zero-shot detectors and existing synthetic generation methods in both accuracy and latency, indicating efficacy of our approach.

Enhancing Hallucination Detection through Perturbation-Based Synthetic Data Generation in System Responses

TL;DR

This work tackles the costly challenge of training hallucination detectors for rapidly evolving LLMs by automatically generating both faithful and hallucinated outputs through a prompt-driven rewriting pipeline. By using a rewriting LLM (GPT-4) to perturb the target system's responses, the authors create a synthetic dataset that closely matches real-world LLM outputs and enables end-to-end training of a T5-base detector. Empirical results on OpenDialKG-Eval and BEGIN show the fine-tuned detector surpasses zero-shot methods and prior synthetic baselines in Macro-F1 and latency, while also reducing annotation costs. The approach yields richer hallucination coverage and demonstrates favorable data quality, with ablations confirming the necessity of including both faithful and hallucination data. Overall, the method offers a cost-effective, adaptable path to improving reliability and safety in LLM-based systems.

Abstract

Detecting hallucinations in large language model (LLM) outputs is pivotal, yet traditional fine-tuning for this classification task is impeded by the expensive and quickly outdated annotation process, especially across numerous vertical domains and in the face of rapid LLM advancements. In this study, we introduce an approach that automatically generates both faithful and hallucinated outputs by rewriting system responses. Experimental findings demonstrate that a T5-base model, fine-tuned on our generated dataset, surpasses state-of-the-art zero-shot detectors and existing synthetic generation methods in both accuracy and latency, indicating efficacy of our approach.
Paper Structure (27 sections, 3 figures, 13 tables)

This paper contains 27 sections, 3 figures, 13 tables.

Figures (3)

  • Figure 1: Overview of our automatic hallucination generation pipeline. Red and green highlights hallucinated and faithful claims.
  • Figure 2: Spiderplot spider-web traces visualizing how the synthesized hallucinations from our approach (in green) + two baselines (HaluEval,Fade, in red and blue) as well as the system response distribution (System,in purple) distribute over the 6 qualitative categories as laid out in §\ref{['subsec:HalluPat']}. Both HaluEval (blue) and FADE (red) show a marked skew towards "Add new entity", while Ours (green) shows a closer alignment with the System (purple).
  • Figure 3: A snapshot of how the initial instructions and examples section of the template would appear to an annotator doing a HIT for our annotation task.