Context-Aware Machine Translation with Source Coreference Explanation

Huy Hien Vu; Hidetaka Kamigaito; Taro Watanabe

Context-Aware Machine Translation with Source Coreference Explanation

Huy Hien Vu, Hidetaka Kamigaito, Taro Watanabe

TL;DR

This paper tackles the weakness of context-aware machine translation in leveraging long-range contextual features by introducing a coreference-explanation sub-model that predicts input coreference clusters to explain translation decisions. The approach fuses a translation model with a coreference-resolution component, training them jointly via a loss $\mathcal{L} = \mathcal{L}_{MT} + \alpha \mathcal{L'}_{Coref}$ and optionally reranking hypotheses with a coreference-based score, while using $p(\boldsymbol{y}|\boldsymbol{x})$ and $p(\boldsymbol{\mathcal{C}}|\boldsymbol{y},\boldsymbol{x})$ to capture the explanation mechanism. Empirically, the method yields improvements over strong baselines on English–Russian, English–German, and multilingual TED data, with BLEU increases exceeding 1.0 points and corroborating gains in BARTScore and COMET metrics; results persist across varying context sizes and corpus scales, demonstrating robustness. The work advances context-aware MT by incorporating interpretable coreference signals, offering practical gains for document-level translation and laying groundwork for more transparent, context-aware translation systems; code and hyperparameters are released for reproducibility.

Abstract

Despite significant improvements in enhancing the quality of translation, context-aware machine translation (MT) models underperform in many cases. One of the main reasons is that they fail to utilize the correct features from context when the context is too long or their models are overly complex. This can lead to the explain-away effect, wherein the models only consider features easier to explain predictions, resulting in inaccurate translations. To address this issue, we propose a model that explains the decisions made for translation by predicting coreference features in the input. We construct a model for input coreference by exploiting contextual features from both the input and translation output representations on top of an existing MT model. We evaluate and analyze our method in the WMT document-level translation task of English-German dataset, the English-Russian dataset, and the multilingual TED talk dataset, demonstrating an improvement of over 1.0 BLEU score when compared with other context-aware models.

Context-Aware Machine Translation with Source Coreference Explanation

TL;DR

and optionally reranking hypotheses with a coreference-based score, while using

and

to capture the explanation mechanism. Empirically, the method yields improvements over strong baselines on English–Russian, English–German, and multilingual TED data, with BLEU increases exceeding 1.0 points and corroborating gains in BARTScore and COMET metrics; results persist across varying context sizes and corpus scales, demonstrating robustness. The work advances context-aware MT by incorporating interpretable coreference signals, offering practical gains for document-level translation and laying groundwork for more transparent, context-aware translation systems; code and hyperparameters are released for reproducibility.

Abstract

Paper Structure (33 sections, 13 equations, 6 figures, 11 tables)

This paper contains 33 sections, 13 equations, 6 figures, 11 tables.

Introduction
Backgrounds
Transformer-based NMT
Context-Aware Transformer-base NMT
Coreference Resolution task
Context-Aware MT with Coreference Information
Architecture
Training
Inference
Experiments
Dataset
Experiment Settings
Translation setting
Baselines systems
Our systems
...and 18 more sections

Figures (6)

Figure 1: Entity heat maps of self-attentions: (a) Base Doc, (b) Trans+Enc and (c) Trans+Coref.
Figure 2: Entity heat maps of self-attentions in the coreference resolution sub-model.
Figure 3: Translation results on En-De datasets with different $m$-to-$m$ translation settings from $m=2$ to $m=4$. The result in the $m=1$ setting serves as the Base Sent reference. The $\alpha$ in Equation \ref{['loss_fuse']} is set to 4.0.
Figure 4: The results with N-best variants on the En-Ru dataset voita19.
Figure 5: The results with N-best variants using the oracle BLEU metric on the En-Ru dataset voita19.
...and 1 more figures

Context-Aware Machine Translation with Source Coreference Explanation

TL;DR

Abstract

Context-Aware Machine Translation with Source Coreference Explanation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)