Correcting Length Bias in Neural Machine Translation
Kenton Murray, David Chiang
TL;DR
This paper identifies beam-search degradation and translation brevity in neural machine translation as manifestations of label bias in locally normalized models. It argues for a lightweight, globally-normalized correction via a tunable per-word reward (γ) and compares it to length normalization across multiple language pairs, showing it can largely eliminate the beam problem and improve translation quality. A perceptron-like method enables fast, dataset-specific tuning of γ, with optimal values highly dependent on beam size and task. The findings suggest incorporating such global corrections into baselines can significantly improve decoding robustness and translation quality in NMT.
Abstract
We study two problems in neural machine translation (NMT). First, in beam search, whereas a wider beam should in principle help translation, it often hurts NMT. Second, NMT has a tendency to produce translations that are too short. Here, we argue that these problems are closely related and both rooted in label bias. We show that correcting the brevity problem almost eliminates the beam problem; we compare some commonly-used methods for doing this, finding that a simple per-word reward works well; and we introduce a simple and quick way to tune this reward using the perceptron algorithm.
