Improving Non-autoregressive Machine Translation with Error Exposure and Consistency Regularization

Xinran Chen; Sufeng Duan; Gongshen Liu

Improving Non-autoregressive Machine Translation with Error Exposure and Consistency Regularization

Xinran Chen, Sufeng Duan, Gongshen Liu

TL;DR

This paper constructs the mixed sequences based on model prediction during training, and proposes to optimize over the masked tokens under imperfect observation conditions, and design a consistency learning method to constrain the data distribution under different observing situations to narrow down the gap between training and inference.

Abstract

Being one of the IR-NAT (Iterative-refinemennt-based NAT) frameworks, the Conditional Masked Language Model (CMLM) adopts the mask-predict paradigm to re-predict the masked low-confidence tokens. However, CMLM suffers from the data distribution discrepancy between training and inference, where the observed tokens are generated differently in the two cases. In this paper, we address this problem with the training approaches of error exposure and consistency regularization (EECR). We construct the mixed sequences based on model prediction during training, and propose to optimize over the masked tokens under imperfect observation conditions. We also design a consistency learning method to constrain the data distribution for the masked tokens under different observing situations to narrow down the gap between training and inference. The experiments on five translation benchmarks obtains an average improvement of 0.68 and 0.40 BLEU scores compared to the base models, respectively, and our CMLMC-EECR achieves the best performance with a comparable translation quality with the Transformer. The experiments results demonstrate the effectiveness of our method.

Improving Non-autoregressive Machine Translation with Error Exposure and Consistency Regularization

TL;DR

Abstract

Paper Structure (40 sections, 11 equations, 4 figures, 13 tables, 1 algorithm)

This paper contains 40 sections, 11 equations, 4 figures, 13 tables, 1 algorithm.

Introduction
Related Work
Approach
Training With Error Exposure
Consistency Regularization
Training and Inference
Length prediction
Training Algorithm
Inference
Experiments
Setup
Dataset
Sequence-level Knowledge Distillation
Details
Evaluation
...and 25 more sections

Figures (4)

Figure 1: The overview of the our EECR strategy. The left part illustrates the sequence prediction process of the mixed sequence generation. The decoder refines the predicted sequence $\hat{Y}$ based on the sequence of the former step $\hat{Y}_{Prev}$ (as shown by the blue dotted line arrows) by $k$ times. Subsequently, the partially masked ground truth sequences are randomly substituted with the predicted tokens $\hat{y}_1$ and $\hat{y}_5$ (as shown by the dashed arrows) and we get the mixed sequences $Y^{1}$ and $Y^{2}$. The right part depicts the consistency learning process. The probability distributions of the masked tokens $\texttt{[M]}$ under the ground truth and mixed sequences are constrained by the consistency regularization (as shown by the bidirectional arrows).
Figure 2: The cosine similarity of masked token representations under different observing scenarios of CMLM-EECR and CMLM.
Figure 3: The training curves of CMLM and CMLM-EECR in IWSLT14 DE$\rightarrow$EN valid set. The inference iteration number is set to 1.
Figure 4: Translation quality on WMT16 EN$\rightarrow$RO test set over the sentence groups of different lengths.

Improving Non-autoregressive Machine Translation with Error Exposure and Consistency Regularization

TL;DR

Abstract

Improving Non-autoregressive Machine Translation with Error Exposure and Consistency Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)