Table of Contents
Fetching ...

Multi-Source Neural Translation

Barret Zoph, Kevin Knight

TL;DR

The paper addresses translating into English using two source languages to reduce ambiguity via triangulation by directly modeling $P(e|f,g)$ in a neural encoder-decoder framework. It compares three fusion approaches—Basic concatenation, Child-Sum, and Multi-Source Attention—to combine two source encodings before decoding. Using the WMT 2014 tri-source dataset, it achieves up to +$4.8$ BLEU gains over a strong single-source baseline, with larger gains when the sources are more linguistically distant, demonstrating the effectiveness of explicit multi-source integration. The work also analyzes attention behaviors and releases code to support reproducibility and further research.

Abstract

We build a multi-source machine translation model and train it to maximize the probability of a target English string given French and German sources. Using the neural encoder-decoder framework, we explore several combination methods and report up to +4.8 Bleu increases on top of a very strong attention-based neural translation model.

Multi-Source Neural Translation

TL;DR

The paper addresses translating into English using two source languages to reduce ambiguity via triangulation by directly modeling in a neural encoder-decoder framework. It compares three fusion approaches—Basic concatenation, Child-Sum, and Multi-Source Attention—to combine two source encodings before decoding. Using the WMT 2014 tri-source dataset, it achieves up to + BLEU gains over a strong single-source baseline, with larger gains when the sources are more linguistically distant, demonstrating the effectiveness of explicit multi-source integration. The work also analyzes attention behaviors and releases code to support reproducibility and further research.

Abstract

We build a multi-source machine translation model and train it to maximize the probability of a target English string given French and German sources. Using the neural encoder-decoder framework, we explore several combination methods and report up to +4.8 Bleu increases on top of a very strong attention-based neural translation model.

Paper Structure

This paper contains 7 sections, 14 equations, 6 figures.

Figures (6)

  • Figure 1: The encoder-decoder framework for neural machine translation (NMT) sutskever2014sequence. Here, a source sentence C B A (presented in reverse order as A B C) is translated into a target sentence W X Y Z. At each step, an evolving real-valued vector summarizes the state of the encoder (white) and decoder (gray).
  • Figure 2: Multi-source encoder-decoder model for MT. We have two source sentences (C B A and K J I) in different languages. Each language has its own encoder; it passes its final hidden and cell state to a set of combiners (in black). The output of a combiner is a hidden state and cell state of the same dimension.
  • Figure 3: Trilingual corpus statistics.
  • Figure 4: Multi-source MT for target English, with source languages French and German. Ppl reports test-set perplexity as the system predicts English tokens. BLEU is scored using the multi-bleu.perl script from Moses. For our evaluation we use a single reference and they are case sensitive.
  • Figure 5: Action of the multi-attention model as the neural decoder generates target English from French/German sources (test set). Lines show strengths of $a_t(s)$.
  • ...and 1 more figures