Table of Contents
Fetching ...

Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata

Jinghong Chen, Weizhe Lin, Jingbiao Mei, Bill Byrne

TL;DR

This work tackles reliable non-autoregressive natural language generation with DA-T5 by introducing Control-DAG, a constrained decoding framework that converts the model-produced DAG into a Weighted Finite State Automaton and enforces lexical, vocabulary, and length constraints. By integrating Hard Lexical Constraints, Vocabulary Constraints, and Length Constraints—and optionally a CBS-style constrained beam search—Control-DAG eliminates OOV errors and ensures specified entities appear, while maintaining fast, DAG-compatible decoding. The approach achieves state-of-the-art non-autoregressive results on Schema Guided Dialogue (SGD) and Data-to-Text (DART) tasks, with zero slot errors and zero neologisms on SGD and strong BLEU/BLEURT gains on both datasets, all while beating AR baselines in speed. These results demonstrate the practical viability of constrained WFSA-based decoding for NAR NLG and highlight the potential of automata-theoretic methods to address longstanding issues in NAR generation.

Abstract

The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model that performs well in Neural Machine Translation. Two issues prevent its application to general Natural Language Generation (NLG) tasks: frequent Out-Of-Vocabulary (OOV) errors and the inability to faithfully generate entity names. We introduce Control-DAG, a constrained decoding algorithm for our Directed Acyclic T5 (DA-T5) model which offers lexical, vocabulary and length control. We show that Control-DAG significantly enhances DA-T5 on the Schema Guided Dialogue and the DART datasets, establishing strong NAR results for Task-Oriented Dialogue and Data-to-Text NLG.

Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata

TL;DR

This work tackles reliable non-autoregressive natural language generation with DA-T5 by introducing Control-DAG, a constrained decoding framework that converts the model-produced DAG into a Weighted Finite State Automaton and enforces lexical, vocabulary, and length constraints. By integrating Hard Lexical Constraints, Vocabulary Constraints, and Length Constraints—and optionally a CBS-style constrained beam search—Control-DAG eliminates OOV errors and ensures specified entities appear, while maintaining fast, DAG-compatible decoding. The approach achieves state-of-the-art non-autoregressive results on Schema Guided Dialogue (SGD) and Data-to-Text (DART) tasks, with zero slot errors and zero neologisms on SGD and strong BLEU/BLEURT gains on both datasets, all while beating AR baselines in speed. These results demonstrate the practical viability of constrained WFSA-based decoding for NAR NLG and highlight the potential of automata-theoretic methods to address longstanding issues in NAR generation.

Abstract

The Directed Acyclic Transformer is a fast non-autoregressive (NAR) model that performs well in Neural Machine Translation. Two issues prevent its application to general Natural Language Generation (NLG) tasks: frequent Out-Of-Vocabulary (OOV) errors and the inability to faithfully generate entity names. We introduce Control-DAG, a constrained decoding algorithm for our Directed Acyclic T5 (DA-T5) model which offers lexical, vocabulary and length control. We show that Control-DAG significantly enhances DA-T5 on the Schema Guided Dialogue and the DART datasets, establishing strong NAR results for Task-Oriented Dialogue and Data-to-Text NLG.
Paper Structure (37 sections, 3 equations, 2 figures, 3 tables, 3 algorithms)

This paper contains 37 sections, 3 equations, 2 figures, 3 tables, 3 algorithms.

Figures (2)

  • Figure 1: Control-DAG with lexical, vocabulary, and length constraints. 1. Directed Acyclic T5 (DA-T5) takes the input text to generate a Directed Acyclic Graph (DAG). 2. The DAG is pruned by likelihood, keeping $K_e$ most likely output tokens and $K_t$ most likely out-going arcs, and converted into a Weighted Finite State Automaton (WFSA). We show WFSA vertices and arcs in the upper-right corner. 3. For lexical and vocabulary constraints, constraint FSAs are built from equivalent regular expressions (Sec.3.1). The length target predictor is a simple linear predictor based on the input sequence length (Sec.4). 4. We intersect the WFSA with constraint FSAs to obtain a constrained WFSA which only contains hypotheses that satisfy all lexical and vocabulary constraints. 5. DFS-Viterbi is used to obtain the most likely string in the constrained WFSA that satisfies the length constraint.
  • Figure 2: Case study comparing DA-T5 with Control-DAG, Joint Viterbi, and CBS-DAG decoding on the SGD dataset.