Table of Contents
Fetching ...

Multilingual Fine-Grained News Headline Hallucination Detection

Jiaming Shen, Tianqi Liu, Jialu Liu, Zhen Qin, Jay Pavagadhi, Simon Baumgartner, Michael Bendersky

TL;DR

This paper tackles hallucination in multilingual news headlines by introducing MFHHD, the first dataset to provide fine-grained, language-aware entailment annotations for 11,469 article-headline pairs across five languages. It demonstrates that supervised fine-tuning benefits from natural language inference pretraining and incorporating explanations, with the best model (mT5_xxl + NLI + Exp) reaching around 74% coarse accuracy and about 67% Example-F1 for fine-grained detection. In the few-shot setting, the authors propose language-dependent demonstration selection and coarse-to-fine prompting to improve in-context learning, showing improvements for PaLM2-L and GPT-4, though these methods still lag behind the best supervised approaches. The work contributes a valuable resource and actionable insights for multilingual, fine-grained headline hallucination detection, with practical implications for improving the faithfulness of automated headlines across languages.

Abstract

The popularity of automated news headline generation has surged with advancements in pre-trained language models. However, these models often suffer from the ``hallucination'' problem, where the generated headline is not fully supported by its source article. Efforts to address this issue have predominantly focused on English, using over-simplistic classification schemes that overlook nuanced hallucination types. In this study, we introduce the first multilingual, fine-grained news headline hallucination detection dataset that contains over 11 thousand pairs in 5 languages, each annotated with detailed hallucination types by experts. We conduct extensive experiments on this dataset under two settings. First, we implement several supervised fine-tuning approaches as preparatory solutions and demonstrate this dataset's challenges and utilities. Second, we test various large language models' in-context learning abilities and propose two novel techniques, language-dependent demonstration selection and coarse-to-fine prompting, to boost the few-shot hallucination detection performance in terms of the example-F1 metric. We release this dataset to foster further research in multilingual, fine-grained headline hallucination detection.

Multilingual Fine-Grained News Headline Hallucination Detection

TL;DR

This paper tackles hallucination in multilingual news headlines by introducing MFHHD, the first dataset to provide fine-grained, language-aware entailment annotations for 11,469 article-headline pairs across five languages. It demonstrates that supervised fine-tuning benefits from natural language inference pretraining and incorporating explanations, with the best model (mT5_xxl + NLI + Exp) reaching around 74% coarse accuracy and about 67% Example-F1 for fine-grained detection. In the few-shot setting, the authors propose language-dependent demonstration selection and coarse-to-fine prompting to improve in-context learning, showing improvements for PaLM2-L and GPT-4, though these methods still lag behind the best supervised approaches. The work contributes a valuable resource and actionable insights for multilingual, fine-grained headline hallucination detection, with practical implications for improving the faithfulness of automated headlines across languages.

Abstract

The popularity of automated news headline generation has surged with advancements in pre-trained language models. However, these models often suffer from the ``hallucination'' problem, where the generated headline is not fully supported by its source article. Efforts to address this issue have predominantly focused on English, using over-simplistic classification schemes that overlook nuanced hallucination types. In this study, we introduce the first multilingual, fine-grained news headline hallucination detection dataset that contains over 11 thousand pairs in 5 languages, each annotated with detailed hallucination types by experts. We conduct extensive experiments on this dataset under two settings. First, we implement several supervised fine-tuning approaches as preparatory solutions and demonstrate this dataset's challenges and utilities. Second, we test various large language models' in-context learning abilities and propose two novel techniques, language-dependent demonstration selection and coarse-to-fine prompting, to boost the few-shot hallucination detection performance in terms of the example-F1 metric. We release this dataset to foster further research in multilingual, fine-grained headline hallucination detection.
Paper Structure (23 sections, 1 equation, 5 figures, 6 tables)

This paper contains 23 sections, 1 equation, 5 figures, 6 tables.

Figures (5)

  • Figure 1: A comparative example of headline hallucination detection at different levels of granularity. The fine-grained hallucination detector goes beyond traditional 3-class label Neutral and offers more nuanced predictions like Unsupported Information (because the article does not references "Europe") and Missing Key Information (as the headline omits the crucial detail that "new routes in the Midwest are being added").
  • Figure 2: Analysis of our multilingual fine-grained headline hallucination detection (MFHHD) dataset.
  • Figure 3: Detecting news headline hallucinations with models of the encoder-decoder architecture.
  • Figure 4: Detecting fine-grained headline hallucinations using LLM with language dependent demonstration selection and coarse-to-fine prompting.
  • Figure 5: Fine-grained headline hallucination labels with illustrative examples.