Table of Contents
Fetching ...

Tagged Back-Translation

Isaac Caswell, Ciprian Chelba, David Grangier

TL;DR

Tagged Back-Translation (TaggedBT) marks synthetic BT data with a distinct input tag to signal its origin, enabling the model to treat BT data as a separate domain. The approach often matches or surpasses NoisedBT across language pairs, with strong gains on EnRo and competitive results on EnDe, while enabling iterative back-translation in some setups. Analyses show the tag drives focused attention on the tag and shifts decoding behavior toward a BT-domain translation, supporting the idea that simple domain signaling effectively separates beneficial and biased signals from synthetic data. Overall, tagging offers a simpler, robust alternative to back-translation noise with practical benefits for NMT systems.

Abstract

Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversify the source side, as previously suggested, but simply to indicate to the model that the given source is synthetic. We propose a simpler alternative to noising techniques, consisting of tagging back-translated source sentences with an extra token. Our results on WMT outperform noised back-translation in English-Romanian and match performance on English-German, re-defining state-of-the-art in the former.

Tagged Back-Translation

TL;DR

Tagged Back-Translation (TaggedBT) marks synthetic BT data with a distinct input tag to signal its origin, enabling the model to treat BT data as a separate domain. The approach often matches or surpasses NoisedBT across language pairs, with strong gains on EnRo and competitive results on EnDe, while enabling iterative back-translation in some setups. Analyses show the tag drives focused attention on the tag and shifts decoding behavior toward a BT-domain translation, supporting the idea that simple domain signaling effectively separates beneficial and biased signals from synthetic data. Overall, tagging offers a simpler, robust alternative to back-translation noise with practical benefits for NMT systems.

Abstract

Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversify the source side, as previously suggested, but simply to indicate to the model that the given source is synthetic. We propose a simpler alternative to noising techniques, consisting of tagging back-translated source sentences with an extra token. Our results on WMT outperform noised back-translation in English-Romanian and match performance on English-German, re-defining state-of-the-art in the former.

Paper Structure

This paper contains 23 sections, 2 equations, 1 figure, 9 tables.

Figures (1)

  • Figure 1: Comparison of attention maps at the first encoder layer for a random training example for BT (row 1), NoisedBT (row 2), and TaggedBT (row 3), for both EnDe (col 1) and EnRo (col 2). Note the heavy attention on the tag (position 0 in row 3), and the diffuse attention map learned by the NoiseBT models. These are the models from Table \ref{['en_de_bt']}.a