Table of Contents
Fetching ...

Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation

Matt Post, David Vilar

TL;DR

This paper introduces Dynamic Beam Allocation (DBA), a lexically constrained decoding algorithm for neural machine translation that achieves constant complexity with respect to the number of constraints by dynamically distributing a fixed beam across constraint banks. DBA builds on Grid Beam Search by avoiding a proliferating beam and instead reallocates slots per time step, enabling efficient handling of large sets of target-side constraints and enabling practical GPU-based constrained decoding. Experimental results on English–German show DBA is faster than GBS for comparable BLEU gains and demonstrates robust constraint placement as the number of constraints grows, while offering valuable insights into the relationship between model scores and BLEU. The approach is implemented in Sockeye and supports complex analysis of beam size, pruning, and constraint interactions, with potential applicability to interactive and domain-adaptation scenarios requiring precise lexical control.

Abstract

The end-to-end nature of neural machine translation (NMT) removes many ways of manually guiding the translation process that were available in older paradigms. Recent work, however, has introduced a new capability: lexically constrained or guided decoding, a modification to beam search that forces the inclusion of pre-specified words and phrases in the output. However, while theoretically sound, existing approaches have computational complexities that are either linear (Hokamp and Liu, 2017) or exponential (Anderson et al., 2017) in the number of constraints. We present a algorithm for lexically constrained decoding with a complexity of O(1) in the number of constraints. We demonstrate the algorithms remarkable ability to properly place these constraints, and use it to explore the shaky relationship between model and BLEU scores. Our implementation is available as part of Sockeye.

Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation

TL;DR

This paper introduces Dynamic Beam Allocation (DBA), a lexically constrained decoding algorithm for neural machine translation that achieves constant complexity with respect to the number of constraints by dynamically distributing a fixed beam across constraint banks. DBA builds on Grid Beam Search by avoiding a proliferating beam and instead reallocates slots per time step, enabling efficient handling of large sets of target-side constraints and enabling practical GPU-based constrained decoding. Experimental results on English–German show DBA is faster than GBS for comparable BLEU gains and demonstrates robust constraint placement as the number of constraints grows, while offering valuable insights into the relationship between model scores and BLEU. The approach is implemented in Sockeye and supports complex analysis of beam size, pruning, and constraint interactions, with potential applicability to interactive and domain-adaptation scenarios requiring precise lexical control.

Abstract

The end-to-end nature of neural machine translation (NMT) removes many ways of manually guiding the translation process that were available in older paradigms. Recent work, however, has introduced a new capability: lexically constrained or guided decoding, a modification to beam search that forces the inclusion of pre-specified words and phrases in the output. However, while theoretically sound, existing approaches have computational complexities that are either linear (Hokamp and Liu, 2017) or exponential (Anderson et al., 2017) in the number of constraints. We present a algorithm for lexically constrained decoding with a complexity of O(1) in the number of constraints. We demonstrate the algorithms remarkable ability to properly place these constraints, and use it to explore the shaky relationship between model and BLEU scores. Our implementation is available as part of Sockeye.

Paper Structure

This paper contains 19 sections, 1 equation, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: An example translating from English to German. The first translation is unconstrained, whereas the remaining ones have one or two constraints imposed. A word-for-word translation of the German output has been provided for the convenience of non-German speaking readers.
  • Figure 2: A single step of the constrained decoder. Along the left is the beam ($k=5$) at time step $t$. The shapes in this beam represent constraints, both met (filled) and unmet (outlined). The blue square represents a phrasal constraint of length 2, which must be completed in order (left half, then right half). A step of the decoder produces a $k \times V_T$ matrix of scores. Each constraint corresponds to a single token in the vocabulary, and is marked along the bottom. Gray squares denote the set of candidates that are produced (§\ref{['section:generating']}) from the $k$ best items ($\bigstar$), from extending each hypothesis with all unfilled constraints ($\rightarrow$), and from its single-best next token ($\Diamond$). Items that violate a phrasal constraint ($\hbox{o}rigin=c]{90}{\circlearrowleft}$) require the phrasal constraint from that hypotheses to be unwound (set to unmet). From these fifteen candidates, the beam at time step $t+1$ is filled, according to the bank allocation strategy, which here assigns one slot in the beam to each bank. The final beam includes coordinates indicating the provenance of chosen items (which are also indicated in bold in the grid).
  • Figure 3: Beam reallocation for $k=5$ with 4 constraints at timestep $t$. There are eight candidates, each having met only 0 or 1 constraint. The allocation policy gives one slot of the beam to each bank. However, there are no candidates for banks 2--4 (greyed), so their slots are redistributed to banks 0 and 1.
  • Figure 4: Running time (seconds / sentence, lower is better) as a function of the number of constraints, $C$ (after applying BPE) on the rand3 dataset. The unconstrained baselines have BLEU scores of 22.3, 22.3, and 22.1 for $k=5,10$, and 20, respectively.
  • Figure 5: BLEU score as a function of beam size under DBA. All constraint sets improve as the beam gets larger (recall that the actual number of constraints increases after BPE and varies by sentence). rand4 performs under the unconstrained baseline if the beam is too low.
  • ...and 4 more figures