Table of Contents
Fetching ...

Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search

Chris Hokamp, Qun Liu

TL;DR

This paper introduces Grid Beam Search (GBS), a decoding algorithm that enforces user-specified lexical constraints within sequence generation without modifying model parameters. By organizing decoding on a t-by-c grid and distinguishing open/closed constraint states, GBS can handle multi-token and discontinuous constraints while leveraging the underlying model's probabilities. Empirical results in interactive machine translation show large improvements when constraints are provided, and domain adaptation experiments demonstrate meaningful BLEU gains using automatically mined terminology. Overall, GBS offers a flexible, general approach to constraint-aware decoding applicable to MT and other text-generation tasks, with potential for broader adoption and future constraint-aware model developments.

Abstract

We present Grid Beam Search (GBS), an algorithm which extends beam search to allow the inclusion of pre-specified lexical constraints. The algorithm can be used with any model that generates a sequence $ \mathbf{\hat{y}} = \{y_{0}\ldots y_{T}\} $, by maximizing $ p(\mathbf{y} | \mathbf{x}) = \prod\limits_{t}p(y_{t} | \mathbf{x}; \{y_{0} \ldots y_{t-1}\}) $. Lexical constraints take the form of phrases or words that must be present in the output sequence. This is a very general way to incorporate additional knowledge into a model's output without requiring any modification of the model parameters or training data. We demonstrate the feasibility and flexibility of Lexically Constrained Decoding by conducting experiments on Neural Interactive-Predictive Translation, as well as Domain Adaptation for Neural Machine Translation. Experiments show that GBS can provide large improvements in translation quality in interactive scenarios, and that, even without any user input, GBS can be used to achieve significant gains in performance in domain adaptation scenarios.

Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search

TL;DR

This paper introduces Grid Beam Search (GBS), a decoding algorithm that enforces user-specified lexical constraints within sequence generation without modifying model parameters. By organizing decoding on a t-by-c grid and distinguishing open/closed constraint states, GBS can handle multi-token and discontinuous constraints while leveraging the underlying model's probabilities. Empirical results in interactive machine translation show large improvements when constraints are provided, and domain adaptation experiments demonstrate meaningful BLEU gains using automatically mined terminology. Overall, GBS offers a flexible, general approach to constraint-aware decoding applicable to MT and other text-generation tasks, with potential for broader adoption and future constraint-aware model developments.

Abstract

We present Grid Beam Search (GBS), an algorithm which extends beam search to allow the inclusion of pre-specified lexical constraints. The algorithm can be used with any model that generates a sequence , by maximizing . Lexical constraints take the form of phrases or words that must be present in the output sequence. This is a very general way to incorporate additional knowledge into a model's output without requiring any modification of the model parameters or training data. We demonstrate the feasibility and flexibility of Lexically Constrained Decoding by conducting experiments on Neural Interactive-Predictive Translation, as well as Domain Adaptation for Neural Machine Translation. Experiments show that GBS can provide large improvements in translation quality in interactive scenarios, and that, even without any user input, GBS can be used to achieve significant gains in performance in domain adaptation scenarios.

Paper Structure

This paper contains 17 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: A visualization of the decoding process for an actual example from our English-German MT experiments. The output token at each timestep appears at the top of the figure, with lexical constraints enclosed in boxes. Generation is shown in blue, Starting new constraints in green, and Continuing constraints in red. The function used to create the hypothesis at each timestep is written at the bottom. Each box in the grid represents a beam; a colored strip inside a beam represents an individual hypothesis in the beam's $k$-best stack. Hypotheses with circles inside them are closed, all other hypotheses are open. (Best viewed in colour).
  • Figure 2: Different structures for beam search. Boxes represent beams which hold $k$-best lists of hypotheses. (A) Chart Parsing using SCFG rules to cover spans in the input. (B) Source coverage as used in PB-SMT. (C) Sequence timesteps (as used in Neural Sequence Models), GBS is an extension of (C). In (A) and (B), hypotheses are finished once they reach the final beam. In (C), a hypothesis is only complete if it has generated an end-of-sequence (EOS) symbol.
  • Figure 3: Visualizing the lexically constrained decoder's complete search graph. Each rectangle represents a beam containing $k$ hypotheses. Dashed (diagonal) edges indicate starting or continuing constraints. Horizontal edges represent generating from the model's distribution. The horizontal axis covers the timesteps in the output sequence, and the vertical axis covers the constraint tokens (one row for each token in each constraint). Beams on the top level of the grid contain hypotheses which cover all constraints.