Table of Contents
Fetching ...

A Survey of Deep Learning Techniques for Neural Machine Translation

Shuoheng Yang, Yuxin Wang, Xiaowen Chu

TL;DR

This survey traces the evolution of neural machine translation from rule-based and statistical approaches to end-to-end deep learning, emphasizing attention mechanisms and encoder–decoder architectures. It details major models (RNN-, CNN-, and Transformer-based), decoding strategies, and methods for handling vocabulary and alignment, such as subword modeling and Copy mechanisms. The paper highlights key advances like global/local attention, GNMT and ConvS2S, and, most notably, Transformer-based architectures with self-attention, while outlining ongoing challenges in long sentences, OOV handling, and low-resource multilingual settings. Collectively, the work clarifies how these innovations enable higher translation quality and faster inference, and it points to practical directions for scaling NMT in real-world applications and under-resourced languages.

Abstract

In recent years, natural language processing (NLP) has got great development with deep learning techniques. In the sub-field of machine translation, a new approach named Neural Machine Translation (NMT) has emerged and got massive attention from both academia and industry. However, with a significant number of researches proposed in the past several years, there is little work in investigating the development process of this new technology trend. This literature survey traces back the origin and principal development timeline of NMT, investigates the important branches, categorizes different research orientations, and discusses some future research trends in this field.

A Survey of Deep Learning Techniques for Neural Machine Translation

TL;DR

This survey traces the evolution of neural machine translation from rule-based and statistical approaches to end-to-end deep learning, emphasizing attention mechanisms and encoder–decoder architectures. It details major models (RNN-, CNN-, and Transformer-based), decoding strategies, and methods for handling vocabulary and alignment, such as subword modeling and Copy mechanisms. The paper highlights key advances like global/local attention, GNMT and ConvS2S, and, most notably, Transformer-based architectures with self-attention, while outlining ongoing challenges in long sentences, OOV handling, and low-resource multilingual settings. Collectively, the work clarifies how these innovations enable higher translation quality and faster inference, and it points to practical directions for scaling NMT in real-world applications and under-resourced languages.

Abstract

In recent years, natural language processing (NLP) has got great development with deep learning techniques. In the sub-field of machine translation, a new approach named Neural Machine Translation (NMT) has emerged and got massive attention from both academia and industry. However, with a significant number of researches proposed in the past several years, there is little work in investigating the development process of this new technology trend. This literature survey traces back the origin and principal development timeline of NMT, investigates the important branches, categorizes different research orientations, and discusses some future research trends in this field.

Paper Structure

This paper contains 43 sections, 13 equations, 14 figures, 1 table, 1 algorithm.

Figures (14)

  • Figure 1: The training process of RNN based NMT. The symbol $<EOS>$ means end of sequence. The embedding layer is for pre-processing. The two RNN layers are used to represent the sequence.
  • Figure 2: End-to-End structure in modern NMT model. The encoder is used to represent the source sentence to semantic vector, while the decoder makes prediction from this semantic vector to a target sentence. End-to-End means the model processes source data to target data directly, without explicable intermediate result.
  • Figure 3: The concept of Bidirectional RNN
  • Figure 4: The process of greedy decoding: each time the model would predict the word with highest probability, and use the current result as the input in next time step to get further prediction
  • Figure 5: The concept of Attention Mechanism,which can provide additional alignment information rather than just using information in fixed-length of vector
  • ...and 9 more figures