Using Source-Side Confidence Estimation for Reliable Translation into Unfamiliar Languages
Kenneth J. Sible, David Chiang
TL;DR
The paper addresses translating into unfamiliar target languages by focusing on source-side confidence cues. It introduces a gradient-based, alignment-free confidence estimator that measures the sensitivity of output probabilities to source embeddings, formalized as $U(x_i)=\sum_{k=1}^{|\mathbf{x}_i|}\left|\frac{\partial\mathbb{P}(y_1,\ldots,y_m\mid x_1,\ldots,x_n)}{\partial\mathbf{x}_i^k}\right|$ and aggregates subword uncertainties, with an intuitive thresholding strategy. An interactive MT system highlights uncertain source words and proposes edits, while an evaluation framework using GPT-4o provides scalable mistranslation annotations and metrics (F1, AUC). The results show the proposed method outperforms alignment-based baselines, and the work demonstrates practical applications, including a mobile-ready web app and plans for broader language coverage and dictionary-assisted corrections. This work advances transparent, user-guided MT by shifting the confidence signal to the source-side and enabling targeted user intervention.
Abstract
We present an interactive machine translation (MT) system designed for users who are not proficient in the target language. It aims to improve trustworthiness and explainability by identifying potentially mistranslated words and allowing the user to intervene to correct mistranslations. However, confidence estimation in machine translation has traditionally focused on the target side. Whereas the conventional approach to source-side confidence estimation would have been to project target word probabilities to the source side via word alignments, we propose a direct, alignment-free approach that measures how sensitive the target word probabilities are to changes in the source embeddings. Experimental results show that our method outperforms traditional alignment-based methods at detection of mistranslations.
