A Survey of Deep Learning Techniques for Neural Machine Translation
Shuoheng Yang, Yuxin Wang, Xiaowen Chu
TL;DR
This survey traces the evolution of neural machine translation from rule-based and statistical approaches to end-to-end deep learning, emphasizing attention mechanisms and encoder–decoder architectures. It details major models (RNN-, CNN-, and Transformer-based), decoding strategies, and methods for handling vocabulary and alignment, such as subword modeling and Copy mechanisms. The paper highlights key advances like global/local attention, GNMT and ConvS2S, and, most notably, Transformer-based architectures with self-attention, while outlining ongoing challenges in long sentences, OOV handling, and low-resource multilingual settings. Collectively, the work clarifies how these innovations enable higher translation quality and faster inference, and it points to practical directions for scaling NMT in real-world applications and under-resourced languages.
Abstract
In recent years, natural language processing (NLP) has got great development with deep learning techniques. In the sub-field of machine translation, a new approach named Neural Machine Translation (NMT) has emerged and got massive attention from both academia and industry. However, with a significant number of researches proposed in the past several years, there is little work in investigating the development process of this new technology trend. This literature survey traces back the origin and principal development timeline of NMT, investigates the important branches, categorizes different research orientations, and discusses some future research trends in this field.
