Table of Contents
Fetching ...

Does Neural Machine Translation Benefit from Larger Context?

Sebastien Jean, Stanislas Lauly, Orhan Firat, Kyunghyun Cho

TL;DR

This paper investigates whether neural machine translation can benefit from incorporating surrounding discourse by adding a context encoder and dual attention for preceding sentences. The proposed larger-context NMT (LC-NMT) extends the standard attention-based MT architecture to fuse context from neighboring sentences into decoding. Across experiments on En-Fr and En-De, LC-NMT yields gains in BLEU, RIBES, and cross-lingual pronoun prediction when trained on small corpora, but these advantages largely disappear with larger datasets, suggesting the model may capture word relations from the source sentence alone in large data regimes. The work also shows that LC-NMT can perform pronoun prediction competitively with top systems from WMT'16, highlighting potential for discourse-aware translation under certain data conditions, and calls for more focused evaluation metrics to properly assess discourse effects.

Abstract

We propose a neural machine translation architecture that models the surrounding text in addition to the source sentence. These models lead to better performance, both in terms of general translation quality and pronoun prediction, when trained on small corpora, although this improvement largely disappears when trained with a larger corpus. We also discover that attention-based neural machine translation is well suited for pronoun prediction and compares favorably with other approaches that were specifically designed for this task.

Does Neural Machine Translation Benefit from Larger Context?

TL;DR

This paper investigates whether neural machine translation can benefit from incorporating surrounding discourse by adding a context encoder and dual attention for preceding sentences. The proposed larger-context NMT (LC-NMT) extends the standard attention-based MT architecture to fuse context from neighboring sentences into decoding. Across experiments on En-Fr and En-De, LC-NMT yields gains in BLEU, RIBES, and cross-lingual pronoun prediction when trained on small corpora, but these advantages largely disappear with larger datasets, suggesting the model may capture word relations from the source sentence alone in large data regimes. The work also shows that LC-NMT can perform pronoun prediction competitively with top systems from WMT'16, highlighting potential for discourse-aware translation under certain data conditions, and calls for more focused evaluation metrics to properly assess discourse effects.

Abstract

We propose a neural machine translation architecture that models the surrounding text in addition to the source sentence. These models lead to better performance, both in terms of general translation quality and pronoun prediction, when trained on small corpora, although this improvement largely disappears when trained with a larger corpus. We also discover that attention-based neural machine translation is well suited for pronoun prediction and compares favorably with other approaches that were specifically designed for this task.

Paper Structure

This paper contains 15 sections, 6 equations, 3 tables.