Table of Contents
Fetching ...

Continuous Diffusion Models Can Obey Formal Syntax

Jinwoo Kim, Taylor Berg-Kirkpatrick, Loris D'Antoni

TL;DR

The paper tackles enforcing formal syntactic constraints, expressed as regular expressions, in continuous diffusion language models. It introduces a training-free guidance mechanism that uses the gradient of the analytically computed probability that a latent state decodes to a valid string, steering the denoising process without auxiliary classifiers. Diffinity, built on PLAID, achieves high constraint satisfaction across JSON and natural-language benchmarks while preserving generation quality and maintaining distributional fidelity, outperforming autoregressive constrained decoding in many settings. The approach hinges on a tokenizer-aligned automaton framework and dynamic-programming computation of expected constraint satisfaction, linking to classifier guidance and enabling scalable, structure-conscious generation in diffusion models. Overall, this work demonstrates that formal syntax can be reliably enforced in diffusion-based text generation with minimal training overhead, albeit with notable computational considerations that motivate future optimizations.

Abstract

Diffusion language models offer a promising alternative to autoregressive models due to their global, non-causal generation process, but their continuous latent dynamics make discrete constraints -- e.g., the output should be a JSON file that matches a given schema -- difficult to impose. We introduce a training-free guidance method for steering continuous diffusion language models to satisfy formal syntactic constraints expressed using regular expressions. Our approach constructs an analytic score estimating the probability that a latent state decodes to a valid string accepted by a given regular expression, and uses its gradient to guide sampling, without training auxiliary classifiers. The denoising process targets the base model conditioned on syntactic validity. We implement our method in Diffinity on top of the PLAID diffusion model and evaluate it on 180 regular-expression constraints over JSON and natural-language benchmarks. Diffinity achieves 68-96\% constraint satisfaction while incurring only a small perplexity cost relative to unconstrained sampling, outperforming autoregressive constrained decoding in both constraint satisfaction and output quality.

Continuous Diffusion Models Can Obey Formal Syntax

TL;DR

The paper tackles enforcing formal syntactic constraints, expressed as regular expressions, in continuous diffusion language models. It introduces a training-free guidance mechanism that uses the gradient of the analytically computed probability that a latent state decodes to a valid string, steering the denoising process without auxiliary classifiers. Diffinity, built on PLAID, achieves high constraint satisfaction across JSON and natural-language benchmarks while preserving generation quality and maintaining distributional fidelity, outperforming autoregressive constrained decoding in many settings. The approach hinges on a tokenizer-aligned automaton framework and dynamic-programming computation of expected constraint satisfaction, linking to classifier guidance and enabling scalable, structure-conscious generation in diffusion models. Overall, this work demonstrates that formal syntax can be reliably enforced in diffusion-based text generation with minimal training overhead, albeit with notable computational considerations that motivate future optimizations.

Abstract

Diffusion language models offer a promising alternative to autoregressive models due to their global, non-causal generation process, but their continuous latent dynamics make discrete constraints -- e.g., the output should be a JSON file that matches a given schema -- difficult to impose. We introduce a training-free guidance method for steering continuous diffusion language models to satisfy formal syntactic constraints expressed using regular expressions. Our approach constructs an analytic score estimating the probability that a latent state decodes to a valid string accepted by a given regular expression, and uses its gradient to guide sampling, without training auxiliary classifiers. The denoising process targets the base model conditioned on syntactic validity. We implement our method in Diffinity on top of the PLAID diffusion model and evaluate it on 180 regular-expression constraints over JSON and natural-language benchmarks. Diffinity achieves 68-96\% constraint satisfaction while incurring only a small perplexity cost relative to unconstrained sampling, outperforming autoregressive constrained decoding in both constraint satisfaction and output quality.
Paper Structure (34 sections, 2 theorems, 6 equations, 2 figures, 4 tables, 2 algorithms)

This paper contains 34 sections, 2 theorems, 6 equations, 2 figures, 4 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $A$ be a DFA that represents the regular language $\mathcal{L}$. Then alg:automaton-score returns the expected probability $\mathds{E}_{s \sim x}[s \in \mathcal{L}]$.

Figures (2)

  • Figure 1: The automaton for the regex c(a|u)t describing the regular constraint $\mathcal{L}=\{\texttt{cat},\texttt{cut}\}$, the unigram distribution $\mathsf{Dec}(x_{\mathit{t}})$ defined by the current latent $x_{\mathit{t}}$ on a vocabulary $\{{\texttt{a, c, r, t, u}}\}$ of size 5, and transition matrices for this automaton and latent space for a sequence of length 3. Valid transitions inside the automaton and their probabilities are color-coded.
  • Figure 2: Guidance-scale trade-offs, PLAID perplexity comparison, and computational overhead.

Theorems & Definitions (7)

  • Example 2.1: Transition Matrices
  • Example 2.2: Computing Expected Probability
  • Theorem 3.1: Expected Probability
  • Definition 4.1: Deterministic Finite-State Automaton
  • Theorem 5.1: Vocabulary Alignment
  • proof
  • proof