Enhanced Hallucination Detection in Neural Machine Translation through Simple Detector Aggregation
Anas Himmi, Guillaume Staerman, Marine Picot, Pierre Colombo, Nuno M. Guerreiro
TL;DR
This paper tackles hallucination detection in neural machine translation by leveraging detector complementarities through STARE, a simple unsupervised aggregation that normalizes and weights multiple detectors. The method aggregates both external proxies (e.g., quality estimators, cross-lingual similarities) and internal model signals (e.g., Seq-Logprob, attention-based metrics) to produce a single robust hallucination score ${ ext{Agg}(x') = \sum_{k=1}^K w_k s_k(x')}$. Across two human-annotated benchmarks, LfaN-Hall and HalOmi, STARE consistently outperforms individual detectors and other baselines, with notable gains when combining internal detectors which can surpass external-only aggregates. The work provides extensive ablations on detector selection and reference-set size, demonstrates robustness to calibration data, and releases code and scores to foster reproducibility and further research. Overall, STARE offers a practical, effective route to more reliable NMT systems by exploiting detector complementarities in an unsupervised fashion.
Abstract
Hallucinated translations pose significant threats and safety concerns when it comes to the practical deployment of machine translation systems. Previous research works have identified that detectors exhibit complementary performance different detectors excel at detecting different types of hallucinations. In this paper, we propose to address the limitations of individual detectors by combining them and introducing a straightforward method for aggregating multiple detectors. Our results demonstrate the efficacy of our aggregated detector, providing a promising step towards evermore reliable machine translation systems.
