Table of Contents
Fetching ...

Hallucination Detection and Hallucination Mitigation: An Investigation

Junliang Luo, Tianyu Li, Di Wu, Michael Jenkin, Steve Liu, Gregory Dudek

TL;DR

This survey addresses the pervasive issue of hallucinations in large language models by detailing a taxonomy of detection and mitigation approaches. It covers token- and sentence-level detectors, data-driven benchmarks, and a spectrum of mitigation strategies including retrieval augmentation, knowledge grounding, control codes, and contrastive learning. The work highlights representative datasets (e.g., HADES, OpenDialKG, SummaC, XENT) and methods (NPH, RHO, ALIGNSCORE, HaRiM+, MixCL, HERMAN) that improve factuality and faithfulness across question answering, summarization, and dialogue tasks. By assembling these techniques and reproducing key results, the paper informs practical deployment of LLMs in real-world settings and points to future work needed to meaningfully reduce hallucinations at scale.

Abstract

Large language models (LLMs), including ChatGPT, Bard, and Llama, have achieved remarkable successes over the last two years in a range of different applications. In spite of these successes, there exist concerns that limit the wide application of LLMs. A key problem is the problem of hallucination. Hallucination refers to the fact that in addition to correct responses, LLMs can also generate seemingly correct but factually incorrect responses. This report aims to present a comprehensive review of the current literature on both hallucination detection and hallucination mitigation. We hope that this report can serve as a good reference for both engineers and researchers who are interested in LLMs and applying them to real world tasks.

Hallucination Detection and Hallucination Mitigation: An Investigation

TL;DR

This survey addresses the pervasive issue of hallucinations in large language models by detailing a taxonomy of detection and mitigation approaches. It covers token- and sentence-level detectors, data-driven benchmarks, and a spectrum of mitigation strategies including retrieval augmentation, knowledge grounding, control codes, and contrastive learning. The work highlights representative datasets (e.g., HADES, OpenDialKG, SummaC, XENT) and methods (NPH, RHO, ALIGNSCORE, HaRiM+, MixCL, HERMAN) that improve factuality and faithfulness across question answering, summarization, and dialogue tasks. By assembling these techniques and reproducing key results, the paper informs practical deployment of LLMs in real-world settings and points to future work needed to meaningfully reduce hallucinations at scale.

Abstract

Large language models (LLMs), including ChatGPT, Bard, and Llama, have achieved remarkable successes over the last two years in a range of different applications. In spite of these successes, there exist concerns that limit the wide application of LLMs. A key problem is the problem of hallucination. Hallucination refers to the fact that in addition to correct responses, LLMs can also generate seemingly correct but factually incorrect responses. This report aims to present a comprehensive review of the current literature on both hallucination detection and hallucination mitigation. We hope that this report can serve as a good reference for both engineers and researchers who are interested in LLMs and applying them to real world tasks.
Paper Structure (42 sections, 1 equation, 15 figures, 18 tables)

This paper contains 42 sections, 1 equation, 15 figures, 18 tables.

Figures (15)

  • Figure 3.1: Example of factual hallucinations in a BART generated summary on XSum dataset narayan-etal-2018-dont. Neither the title "European Commission President" nor the first name "Jean- Claude" is mentioned in the document but both are factual. (Figure source: dziri2021neural)
  • Figure 3.2: Generation of synthetic data with hallucination labels. A hallucinated version of the original text is generated by feeding the noised sentence to the encoder-decoder model BART. Hallucination labels are assigned to each token by computing the edit distance between the hallucinated text and the original one. Labels of $1$ refer to hallucinated words. (Figure source: dziri2021neural)
  • Figure 3.3: Finetuning XLM-Roberta (for cross-lingual generation task, e.g. MT) or Roberta (for monolingual generation task, e.g. text summarization) on the synthetic training data. (Figure source: zhou2021detecting)
  • Figure 3.4: Flowchart of the data annotation process. (Figure source: manakul2023selfcheckgpt)
  • Figure 3.5: SelfCheckGPT with Question Answering. (Figure source: manakul2023selfcheckgpt)
  • ...and 10 more figures