Fine-grained Hallucination Detection and Editing for Language Models

Abhika Mishra; Akari Asai; Vidhisha Balachandran; Yizhong Wang; Graham Neubig; Yulia Tsvetkov; Hannaneh Hajishirzi

Fine-grained Hallucination Detection and Editing for Language Models

Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Graham Neubig, Yulia Tsvetkov, Hannaneh Hajishirzi

TL;DR

This paper introduces a fine-grained taxonomy and a pair of tasks for detecting and editing hallucinations in language models, addressing limitations of binary or entity-level approaches. It contributes Fava, a retrieval-augmented editing model trained on a large synthetic corpus to identify and fix factual errors at the span level, and FavaBench, the first human-annotated benchmark of its kind with ~1k detailed annotations across multiple models. Empirical results show that Fava outperforms strong baselines on both fine-grained detection and editing, with retrieval-guided evidence improving factuality. The findings underscore the importance of span-level grounding and context retrieval for robust information-seeking LM deployments, while highlighting remaining challenges for unverifiable and invented error types.$

Abstract

Large language models (LMs) are prone to generate factual errors, which are often called hallucinations. In this paper, we introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms, each requiring varying degrees of careful assessments to verify factuality. We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench, that includes about one thousand fine-grained human judgments on three LM outputs across various domains. Our analysis reveals that ChatGPT and Llama2-Chat (70B, 7B) exhibit diverse types of hallucinations in the majority of their outputs in information-seeking scenarios. We train FAVA, a retrieval-augmented LM by carefully creating synthetic data to detect and correct fine-grained hallucinations. On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT and GPT-4 on fine-grained hallucination detection, and edits suggested by FAVA improve the factuality of LM-generated text.

Fine-grained Hallucination Detection and Editing for Language Models

TL;DR

Abstract

Paper Structure (37 sections, 2 equations, 7 figures, 18 tables)

This paper contains 37 sections, 2 equations, 7 figures, 18 tables.

Introduction
Related Work
Fine-grained Hallucination Detection
Hallucination Taxonomy
Tasks and Metrics
Benchmark: $\textsc{FavaBench}$
Model: Fava
Synthetic Training Data Curation
Training and Inference
Experiments
Experiments for Hallucination Detection
Experiments for Hallucination Editing
Human Evaluations
Results and Analysis
Results
...and 22 more sections

Figures (7)

Figure 1: Overview of our taxonomy, fine-grained hallucination detection task, and Fava.
Figure 2: An overview of our fine-grained hallucination taxonomy. We identify 6 fine-grained types representing diverse hallucinations in LM-generated text.
Figure 3: Distribution of hallucination types in ChatGPT, Llama2-Chat-7B and Llama2-Chat-70B outputs across four datasets of diverse information-seeking queries.
Figure 4: Overview of high-quality synthetic data generation process in Fava. Fava leverages powerful instruction-tuned models to carefully insert errors into factually accurate statements and produces diverse error types based on our proposed taxonomy.
Figure 5: Annotation interface.
...and 2 more figures

Fine-grained Hallucination Detection and Editing for Language Models

TL;DR

Abstract

Fine-grained Hallucination Detection and Editing for Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)