Table of Contents
Fetching ...

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, Tie-Yan Liu

TL;DR

This paper addresses the accuracy bottleneck of non-autoregressive translation (NAT) by enriching the decoder inputs with target-side information. It introduces Enhanced NAT (ENAT), which employs two strategies: a phrase-table lookup that translates source phrases into target tokens and an embedding-mapping approach that learns a linear projection from source to target embedding space, incorporating sentence-level alignment and word-level adversarial learning. With sequence-level knowledge distillation and a length-prediction mechanism during inference, ENAT achieves substantial BLEU gains over NAT baselines on WMT14 En-De and WMT16 En-Ro, bringing NAT performance closer to autoregressive models while maintaining speed advantages. The results demonstrate the value of stronger decoder-side signals for NAT and offer practical guidance on when to favor phrase-table signals versus learned embedding mappings, depending on dataset quality. The work opens avenues for broader language coverage and more sophisticated decoder-input augmentation strategies to further narrow the AR-NAR performance gap in real-time translation systems.

Abstract

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, which are then fed into the decoder as inputs. The second one transforms source-side word embeddings to target-side word embeddings through sentence-level alignment and word-level adversary learning, and then feeds the transformed word embeddings into the decoder as inputs. Experimental results show our method largely outperforms the NAT baseline~\citep{gu2017non} by $5.11$ BLEU scores on WMT14 English-German task and $4.72$ BLEU scores on WMT16 English-Romanian task.

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

TL;DR

This paper addresses the accuracy bottleneck of non-autoregressive translation (NAT) by enriching the decoder inputs with target-side information. It introduces Enhanced NAT (ENAT), which employs two strategies: a phrase-table lookup that translates source phrases into target tokens and an embedding-mapping approach that learns a linear projection from source to target embedding space, incorporating sentence-level alignment and word-level adversarial learning. With sequence-level knowledge distillation and a length-prediction mechanism during inference, ENAT achieves substantial BLEU gains over NAT baselines on WMT14 En-De and WMT16 En-Ro, bringing NAT performance closer to autoregressive models while maintaining speed advantages. The results demonstrate the value of stronger decoder-side signals for NAT and offer practical guidance on when to favor phrase-table signals versus learned embedding mappings, depending on dataset quality. The work opens avenues for broader language coverage and more sophisticated decoder-input augmentation strategies to further narrow the AR-NAR performance gap in real-time translation systems.

Abstract

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, which are then fed into the decoder as inputs. The second one transforms source-side word embeddings to target-side word embeddings through sentence-level alignment and word-level adversary learning, and then feeds the transformed word embeddings into the decoder as inputs. Experimental results show our method largely outperforms the NAT baseline~\citep{gu2017non} by BLEU scores on WMT14 English-German task and BLEU scores on WMT16 English-Romanian task.

Paper Structure

This paper contains 17 sections, 10 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The architecture of our model. A concrete description of fine-grained modules can be found in Section \ref{['sec:arch']}.
  • Figure 2: The BLEU scores comparison between AT, NART, and our method over sentences in different length buckets on newstest2014. Best view in color.