Table of Contents
Fetching ...

Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

David Dale, Elena Voita, Loïc Barrault, Marta R. Costa-jussà

TL;DR

Hallucinations in neural machine translation are rare but user-impactful. The authors demonstrate that internal model signals (ALTI+) can detect the most severe hallucinations about twice as accurately as prior methods and can match external baselines for mitigation in a detect-then-rewrite framework. When allowed external signals, cross-lingual sentence similarity models like LaBSE and XNLI substantially improve detection and reranking, with LaBSE providing the strongest reranking gains. The work shows that relying on model internal workings suffices for significant improvements, while semantic similarity signals offer substantial further gains, enabling robust hallucination mitigation across language pairs even without external quality estimators.

Abstract

While the problem of hallucinations in neural machine translation has long been recognized, so far the progress on its alleviation is very little. Indeed, recently it turned out that without artificially encouraging models to hallucinate, previously existing methods fall short and even the standard sequence log-probability is more informative. It means that characteristics internal to the model can give much more information than we expect, and before using external models and measures, we first need to ask: how far can we go if we use nothing but the translation model itself ? We propose to use a method that evaluates the percentage of the source contribution to a generated translation. Intuitively, hallucinations are translations "detached" from the source, hence they can be identified by low source contribution. This method improves detection accuracy for the most severe hallucinations by a factor of 2 and is able to alleviate hallucinations at test time on par with the previous best approach that relies on external models. Next, if we move away from internal model characteristics and allow external tools, we show that using sentence similarity from cross-lingual embeddings further improves these results.

Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

TL;DR

Hallucinations in neural machine translation are rare but user-impactful. The authors demonstrate that internal model signals (ALTI+) can detect the most severe hallucinations about twice as accurately as prior methods and can match external baselines for mitigation in a detect-then-rewrite framework. When allowed external signals, cross-lingual sentence similarity models like LaBSE and XNLI substantially improve detection and reranking, with LaBSE providing the strongest reranking gains. The work shows that relying on model internal workings suffices for significant improvements, while semantic similarity signals offer substantial further gains, enabling robust hallucination mitigation across language pairs even without external quality estimators.

Abstract

While the problem of hallucinations in neural machine translation has long been recognized, so far the progress on its alleviation is very little. Indeed, recently it turned out that without artificially encouraging models to hallucinate, previously existing methods fall short and even the standard sequence log-probability is more informative. It means that characteristics internal to the model can give much more information than we expect, and before using external models and measures, we first need to ask: how far can we go if we use nothing but the translation model itself ? We propose to use a method that evaluates the percentage of the source contribution to a generated translation. Intuitively, hallucinations are translations "detached" from the source, hence they can be identified by low source contribution. This method improves detection accuracy for the most severe hallucinations by a factor of 2 and is able to alleviate hallucinations at test time on par with the previous best approach that relies on external models. Next, if we move away from internal model characteristics and allow external tools, we show that using sentence similarity from cross-lingual embeddings further improves these results.
Paper Structure (43 sections, 7 figures, 6 tables)

This paper contains 43 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Taxonomy of translation types (based on the dataset by guerreiro_hallucinations).
  • Figure 2: Kernel density estimation of the distribution of the detection criteria by translation pathology type.
  • Figure 3: Distribution of translation types when selecting the worst 10% of the dataset according to each metric. In the original dataset, these types were annotated in a multilabel manner (e.g. the same translation could be annotated both as oscillatory hallucination and as a named entity error). To assign a single label to each translation, we choose the most severe pathology type (with severity increasing clockwise from "Correct translations" to "Fully detached").
  • Figure 4: Recalls by translation types when selecting the worst 20% of the dataset according to each metric. Here, the types are presented in a multilabel manner, i.e. one translation may contribute to multiple axes.
  • Figure 5: For all combinations of a generation strategy and a reranker, heatmaps show scores for the final translations.
  • ...and 2 more figures