Table of Contents
Fetching ...

Fast Structured Decoding for Sequence Models

Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, Zhi-Hong Deng

TL;DR

This work tackles the latency problem of autoregressive translation by enhancing non-autoregressive models with a structured CRF-based decoder that captures dependencies between adjacent target tokens. It introduces efficient approximations (low-rank and beam) and a dynamic transition mechanism to model positional context, enabling near-autoregressive accuracy with modest latency overhead. Empirical results on WMT14 and IWSLT14 show substantial BLEU gains over prior NART methods and close performance to autoregressive teachers, with significant speedups. The approach offers a principled, exact decoding framework for MT and demonstrates the practical viability of structured inference in fast sequence modeling.

Abstract

Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to reduce the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms), our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models.

Fast Structured Decoding for Sequence Models

TL;DR

This work tackles the latency problem of autoregressive translation by enhancing non-autoregressive models with a structured CRF-based decoder that captures dependencies between adjacent target tokens. It introduces efficient approximations (low-rank and beam) and a dynamic transition mechanism to model positional context, enabling near-autoregressive accuracy with modest latency overhead. Empirical results on WMT14 and IWSLT14 show substantial BLEU gains over prior NART methods and close performance to autoregressive teachers, with significant speedups. The approach offers a principled, exact decoding framework for MT and demonstrates the practical viability of structured inference in fast sequence modeling.

Abstract

Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to reduce the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms), our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models.

Paper Structure

This paper contains 22 sections, 9 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Illustration of the Transformer model and our Transformer-based NART-CRF model
  • Figure 2: Illustration of the decoding inconsistency problem in non-autoregressive decoding and how a CRF-based structured inference module solves it.