Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Wenxuan Wang; Wenxiang Jiao; Shuo Wang; Zhaopeng Tu; Michael R. Lyu

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Wenxuan Wang, Wenxiang Jiao, Shuo Wang, Zhaopeng Tu, Michael R. Lyu

TL;DR

Two lightweight and complementary approaches are proposed to denoise the training data for model training and explicitly penalize the off-target translations by unlikelihood training during model training to improve the performance of zero-shot translation over strong MNMT baselines.

Abstract

Zero-shot translation is a promising direction for building a comprehensive multilingual neural machine translation~(MNMT) system. However, its quality is still not satisfactory due to off-target issues. In this paper, we aim to understand and alleviate the off-target issues from the perspective of uncertainty in zero-shot translation. By carefully examining the translation output and model confidence, we identify two uncertainties that are responsible for the off-target issues, namely, extrinsic data uncertainty and intrinsic model uncertainty. Based on the observations, we propose two lightweight and complementary approaches to denoise the training data for model training and explicitly penalize the off-target translations by unlikelihood training during model training. Extensive experiments on both balanced and imbalanced datasets show that our approaches significantly improve the performance of zero-shot translation over strong MNMT baselines.

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

TL;DR

Abstract

Paper Structure (38 sections, 5 equations, 4 figures, 13 tables)

This paper contains 38 sections, 5 equations, 4 figures, 13 tables.

Introduction
Preliminary
Multilingual Neural Machine Translation
Definition of Off-Target Issue
Existing Works on Off-Target Issue
Experimental Setup
Training Data
Validation Set
Multi-Source Test Set
Model
Analyzing Uncertainty
MNMT Models are Well Trained
Poor Zero-Shot Performance and Off-Target Issues
Uncertain Prediction Causes Off-Target Issues
Extrinsic Data Uncertainty
...and 23 more sections

Figures (4)

Figure 1: Per-token probabilities of (a) supervised Fr-En (BLEU$\uparrow$: 38.8; OTR$\downarrow$: 1.6) and (b) zero-shot Fr-De (BLEU$\uparrow$: 5.4; OTR$\downarrow$: 74.9) translations. Higher probabilities are expected for the on-target references ("Reference"), and lower probabilities are expected for the off-target distractor translations ("Off-Target").
Figure 2: Probability over the vocabulary of supervised (En-De, Fr-En) and zero-shot Fr-De translations. MNMT model over-estimates the off-target vocabulary for zero-shot translation.
Figure 3: Impact of interpolation weight $\alpha$ and fine-tune step $K$ on zero-shot translations.
Figure 4: Per-token probabilities of off-target test sentences for zero-shot Fr-De translations for T-Enc model with our methods on OPUS-6 data.

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

TL;DR

Abstract

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)