What Have We Achieved on Non-autoregressive Translation?

Yafu Li; Huajian Zhang; Jianhao Yan; Yongjing Yin; Yue Zhang

What Have We Achieved on Non-autoregressive Translation?

Yafu Li, Huajian Zhang, Jianhao Yan, Yongjing Yin, Yue Zhang

TL;DR

This paper evaluates how close non-autoregressive translation (NAT) approaches truly come to autoregressive translation (AT) beyond BLEU, using four representative NAT methods and human evaluation. By systematically comparing MgMO, CTC, DAT, and CMLM against AT on WMT benchmarks across automatic metrics, GPT-4–based judgments, and MQM human ratings, it shows that AT generally outperforms NAT on model-based and human-aligned metrics, even as some NAT variants approach AT on rule-based metrics. A key finding is that explicit modeling of target-side dependencies markedly improves translation fluency and generalization, while weaknesses in dependency modeling lead to repetitions, omissions, and spelling errors. The work highlights that advancing NAT requires stronger explicit dependency modeling without sacrificing decoding speed, guiding future research toward more faithful and robust one-shot translation systems.

Abstract

Recent advances have made non-autoregressive (NAT) translation comparable to autoregressive methods (AT). However, their evaluation using BLEU has been shown to weakly correlate with human annotations. Limited research compares non-autoregressive translation and autoregressive translation comprehensively, leaving uncertainty about the true proximity of NAT to AT. To address this gap, we systematically evaluate four representative NAT methods across various dimensions, including human evaluation. Our empirical results demonstrate that despite narrowing the performance gap, state-of-the-art NAT still underperforms AT under more reliable evaluation metrics. Furthermore, we discover that explicitly modeling dependencies is crucial for generating natural language and generalizing to out-of-distribution sequences.

What Have We Achieved on Non-autoregressive Translation?

TL;DR

Abstract

Paper Structure (41 sections, 15 equations, 5 figures, 16 tables)

This paper contains 41 sections, 15 equations, 5 figures, 16 tables.

Introduction
Method
Neural Machine Translation
Autoregressive Translation.
Non-autoregressive Translation.
Challenges of NAT.
NAT with Advanced Optimization
NAT with Latent Alignments
NAT with Explicit Dependency
NAT with Iterative Refinement
Experiment and Setup
Datasets and Models.
Evaluation.
Translation Quality
Automatic Evaluation
...and 26 more sections

Figures (5)

Figure 1: Heatmap visualization of MQM evaluation: darker colours indicate larger error counts for certain error types. The left side presents major-level errors while the right side shows minor-level errors.
Figure 2: N-gram repetition of different models (WMT21 De$\Rightarrow$En), where the x-axis represents the size of the n-gram and the y-axis represents the count.
Figure 3: Translation quality (COMET) w.r.t. source sequence length on WMT21 De$\Rightarrow$En.
Figure 4: Average cross-domain performance (COMET) of WMT21 De$\Rightarrow$En models on out-of-domain testsets.
Figure 5: Translation performance (COMET) decreases (%) on noisy testsets of WMT21 De$\Rightarrow$En, with darker colours indicating greater degradation.

What Have We Achieved on Non-autoregressive Translation?

TL;DR

Abstract

What Have We Achieved on Non-autoregressive Translation?

Authors

TL;DR

Abstract

Table of Contents

Figures (5)