Table of Contents
Fetching ...

End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification

Jindřich Libovický, Jindřich Helcl

TL;DR

This paper introduces an end-to-end non-autoregressive neural machine translation model grounded in Connectionist Temporal Classification (CTC), enabling parallel generation of output tokens. By projecting encoder outputs to a length-$kT_x$ sequence and labeling positions with tokens or nulls, the approach trains with CTC to account for all possible alignments, avoiding multi-step inference. Evaluations on English–Romanian and English–German show competitive BLEU scores relative to prior non-autoregressive methods and substantial decoding speedups (approximately 4x) over autoregressive baselines, though with some quality gaps that may be bridged by beam search and external language models. The work highlights a viable path to fast, end-to-end non-autoregressive translation and suggests practical avenues for further optimization and integration into MT pipelines.

Abstract

Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in parallel. We present a novel non-autoregressive architecture based on connectionist temporal classification and evaluate it on the task of neural machine translation. Unlike other non-autoregressive methods which operate in several steps, our model can be trained end-to-end. We conduct experiments on the WMT English-Romanian and English-German datasets. Our models achieve a significant speedup over the autoregressive models, keeping the translation quality comparable to other non-autoregressive models.

End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification

TL;DR

This paper introduces an end-to-end non-autoregressive neural machine translation model grounded in Connectionist Temporal Classification (CTC), enabling parallel generation of output tokens. By projecting encoder outputs to a length- sequence and labeling positions with tokens or nulls, the approach trains with CTC to account for all possible alignments, avoiding multi-step inference. Evaluations on English–Romanian and English–German show competitive BLEU scores relative to prior non-autoregressive methods and substantial decoding speedups (approximately 4x) over autoregressive baselines, though with some quality gaps that may be bridged by beam search and external language models. The work highlights a viable path to fast, end-to-end non-autoregressive translation and suggests practical avenues for further optimization and integration into MT pipelines.

Abstract

Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in parallel. We present a novel non-autoregressive architecture based on connectionist temporal classification and evaluate it on the task of neural machine translation. Unlike other non-autoregressive methods which operate in several steps, our model can be trained end-to-end. We conduct experiments on the WMT English-Romanian and English-German datasets. Our models achieve a significant speedup over the autoregressive models, keeping the translation quality comparable to other non-autoregressive models.

Paper Structure

This paper contains 6 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Scheme of the proposed architecture. The part between the encoder and the decoder is expressed by Equation \ref{['eq:split']}.
  • Figure 2: Comparison of the sentence-level BLEU of our English-to-German autoregresssive (AR) and non-autoregressive (NAR) models given the length of the source sentence.
  • Figure 3: Comparison of CPU decoding time by our autoregressive (AR) and non-autoregressive (NAR) models based on the source sentence length.