Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation
Yanming Sun, Runzhe Zhan, Chi Seng Cheang, Han Wu, Xuebo Liu, Yuyao Niu, Fengying Ye, Kaixin Lan, Lidia S. Chao, Derek F. Wong
TL;DR
This study systematically analyzes vulnerabilities of Retrieval-Augmented LLM-based Machine Translation (REAL-MT) under noisy retrieval. It introduces a controlled noise synthesis framework and two metrics, Fidelity and Context Adoption Rate (CAR), to quantify idiom-translation fidelity and context reliance across language-resource levels. Experiments with Qwen-series LLMs and Claude reveal that semantic noise degrades translation quality, especially for low-resource languages, and that large reasoning models tend to rationalize incorrect contexts with overconfidence and attention shifts toward retrieved content. Training-free and fine-tuning mitigation strategies improve robustness but trade off performance in clean settings, highlighting the need for self-verifying integration mechanisms that validate retrieved content before usage.
Abstract
\textbf{RE}trieval-\textbf{A}ugmented \textbf{L}LM-based \textbf{M}achine \textbf{T}ranslation (REAL-MT) shows promise for knowledge-intensive tasks like idiomatic translation, but its reliability under noisy retrieval contexts remains poorly understood despite this being a common challenge in real-world deployment. To address this gap, we propose a noise synthesis framework and new metrics to evaluate the robustness of REAL-MT systematically. Using this framework, we instantiate REAL-MT with Qwen-series models, including standard LLMs and large reasoning models (LRMs) with enhanced reasoning, and evaluate their performance on idiomatic translation across high-, medium-, and low-resource language pairs under synthesized noise. Our results show that low-resource language pairs, which rely more heavily on retrieved context, degrade more severely under noise than high-resource ones and often produce nonsensical translations. Although LRMs possess enhanced reasoning capabilities, they show no improvement in error correction and are even more susceptible to noise, tending to rationalize incorrect contexts. We find that this stems from an attention shift away from the source idiom to noisy content, while confidence increases despite declining accuracy, indicating poor calibration. To mitigate these issues, we investigate training-free and fine-tuning strategies, which improve robustness at the cost of performance in clean contexts, revealing a fundamental trade-off. Our findings highlight the limitations of current approaches, underscoring the need for self-verifying integration mechanisms.
