Continuous Diffusion Models Can Obey Formal Syntax
Jinwoo Kim, Taylor Berg-Kirkpatrick, Loris D'Antoni
TL;DR
The paper tackles enforcing formal syntactic constraints, expressed as regular expressions, in continuous diffusion language models. It introduces a training-free guidance mechanism that uses the gradient of the analytically computed probability that a latent state decodes to a valid string, steering the denoising process without auxiliary classifiers. Diffinity, built on PLAID, achieves high constraint satisfaction across JSON and natural-language benchmarks while preserving generation quality and maintaining distributional fidelity, outperforming autoregressive constrained decoding in many settings. The approach hinges on a tokenizer-aligned automaton framework and dynamic-programming computation of expected constraint satisfaction, linking to classifier guidance and enabling scalable, structure-conscious generation in diffusion models. Overall, this work demonstrates that formal syntax can be reliably enforced in diffusion-based text generation with minimal training overhead, albeit with notable computational considerations that motivate future optimizations.
Abstract
Diffusion language models offer a promising alternative to autoregressive models due to their global, non-causal generation process, but their continuous latent dynamics make discrete constraints -- e.g., the output should be a JSON file that matches a given schema -- difficult to impose. We introduce a training-free guidance method for steering continuous diffusion language models to satisfy formal syntactic constraints expressed using regular expressions. Our approach constructs an analytic score estimating the probability that a latent state decodes to a valid string accepted by a given regular expression, and uses its gradient to guide sampling, without training auxiliary classifiers. The denoising process targets the base model conditioned on syntactic validity. We implement our method in Diffinity on top of the PLAID diffusion model and evaluate it on 180 regular-expression constraints over JSON and natural-language benchmarks. Diffinity achieves 68-96\% constraint satisfaction while incurring only a small perplexity cost relative to unconstrained sampling, outperforming autoregressive constrained decoding in both constraint satisfaction and output quality.
