Table of Contents
Fetching ...

DINGO: Constrained Inference for Diffusion LLMs

Tarun Suresh, Debangshu Banerjee, Shubham Ugare, Sasa Misailovic, Gagandeep Singh

TL;DR

DINGO introduces a first provably constrained, distribution-preserving decoding framework for diffusion LLMs that enforces user-defined regular expressions across output blocks. By constructing a token-level DFA and applying a dynamic-programming formulation, it maximizes the model’s probability over valid strings while guaranteeing that outputs remain prefixes of strings in the target language. The approach delivers substantial performance gains on symbolic math and JSON-generation benchmarks, achieving up to $68$ percentage-point improvements and $100\%$ syntactic/schema validity in several settings. This work enables reliable, structured-output diffusion LLMs for tasks requiring formal guarantees, with practical implications for reasoning and tool use in real-world applications.

Abstract

Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering significant potential for improved runtime efficiency. However, existing diffusion models lack the ability to provably enforce user-specified formal constraints, such as regular expressions, which makes them unreliable for tasks that require structured outputs, such as fixed-schema JSON generation. Unlike autoregressive models that generate tokens sequentially, diffusion LLMs predict a block of tokens in parallel. This parallelism makes traditional constrained decoding algorithms, which are designed for sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. DINGO enables sampling of output strings with the highest probability under the model's predicted distribution, while strictly satisfying any user-specified regular expression. On standard symbolic math and JSON generation benchmarks, DINGO achieves up to a 68 percentage point improvement over unconstrained inference

DINGO: Constrained Inference for Diffusion LLMs

TL;DR

DINGO introduces a first provably constrained, distribution-preserving decoding framework for diffusion LLMs that enforces user-defined regular expressions across output blocks. By constructing a token-level DFA and applying a dynamic-programming formulation, it maximizes the model’s probability over valid strings while guaranteeing that outputs remain prefixes of strings in the target language. The approach delivers substantial performance gains on symbolic math and JSON-generation benchmarks, achieving up to percentage-point improvements and syntactic/schema validity in several settings. This work enables reliable, structured-output diffusion LLMs for tasks requiring formal guarantees, with practical implications for reasoning and tool use in real-world applications.

Abstract

Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering significant potential for improved runtime efficiency. However, existing diffusion models lack the ability to provably enforce user-specified formal constraints, such as regular expressions, which makes them unreliable for tasks that require structured outputs, such as fixed-schema JSON generation. Unlike autoregressive models that generate tokens sequentially, diffusion LLMs predict a block of tokens in parallel. This parallelism makes traditional constrained decoding algorithms, which are designed for sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. DINGO enables sampling of output strings with the highest probability under the model's predicted distribution, while strictly satisfying any user-specified regular expression. On standard symbolic math and JSON generation benchmarks, DINGO achieves up to a 68 percentage point improvement over unconstrained inference

Paper Structure

This paper contains 28 sections, 4 theorems, 5 equations, 5 figures, 7 tables, 5 algorithms.

Key Result

Proposition 4.0

[Correctness] Given any regular expression $\mathcal{R}$, input prompt $\pmb{p}\in V^{m}$, block length $d$, output distribution $\mathcal{D}_{m+d} = \pmb{v}_1\dots\pmb{v}_{m+d}$, if $L_{P}(\mathcal{R}) \cap (V^{}\setminus \bot)^{d} \neq \{\}$ and $\pmb{r} \sim \pmb{v}_{m+1}\dots\pmb{v}_{m+d}$ be th

Figures (5)

  • Figure 1: Ablation Study on The Number of Diffusion Blocks For GSM-Symbolic
  • Figure 2: An example from the GSM-symbolic dataset (variables in blue), where unconstrained generation produces syntactically incorrect output, and greedy constrained generation yields a syntactically valid but incorrect answer. In contrast, DINGO generates the correct answer.
  • Figure 3: An example from the GSM-symbolic dataset (variables in blue), where unconstrained generation produces syntactically incorrect output, and greedy constrained generation yields a syntactically valid but incorrect answer. In contrast, DINGO generates the correct answer.
  • Figure 4: An example from JSON generation, where unconstrained generation produces a syntactically incorrect output, and greedy constrained generation yields a valid but incomplete prefix. In contrast, DINGO generates a syntactically correct answer.
  • Figure 5: An example from JSON generation, where unconstrained generation produces a syntactically incorrect output, and greedy constrained generation yields a valid but incomplete prefix. In contrast, DINGO generates a syntactically correct answer.

Theorems & Definitions (14)

  • Definition 2.1: Diffusion step
  • Definition 2.2: Single block diffusion LLM
  • Definition 2.3: Semi Autoregressive diffusion LLM
  • Definition 2.4
  • Definition 2.5: extended transition function
  • Definition 2.6: Live DFA states
  • Definition 3.1: Substitution Set
  • Definition 3.2: Correctness of Constrained decoder
  • Proposition 4.0
  • Proposition 4.0
  • ...and 4 more