Table of Contents
Fetching ...

Constrained Discrete Diffusion

Michael Cardei, Jacob K Christopher, Thomas Hartvigsen, Bhavya Kailkhura, Ferdinando Fioretto

TL;DR

This work introduces Constrained Discrete Diffusion (CDD), a training-free framework that embeds a differentiable constraint projection into the reverse process of discrete diffusion models to enforce global sequence-level constraints. By formulating projections as KL-minimizations and solving them with an Augmented Lagrangian approach (augmented with a Gumbel-Softmax relaxation for differentiability), CDD achieves zero constraint violations across toxicity mitigation, molecule design, and instruction-following tasks while maintaining fluency and coherence. Theoretical guarantees show convergence to feasible regions with bounded KL drift, and extensive experiments demonstrate state-of-the-art constraint adherence compared to autoregressive and existing discrete diffusion baselines. Overall, CDD provides a scalable, principled method for controllable generation in sensitive and domain-specific applications, without requiring retraining or post-hoc filtering.

Abstract

Discrete diffusion models are a class of generative models that construct sequences by progressively denoising samples from a categorical noise distribution. Beyond their rapidly growing ability to generate coherent natural language, these models present a new and important opportunity to enforce sequence-level constraints, a capability that current autoregressive models cannot natively provide. This paper capitalizes on this opportunity by introducing Constrained Discrete Diffusion (CDD), a novel integration of differentiable constraint optimization within the diffusion process to ensure adherence to constraints, logic rules, or safety requirements for generated sequences. Unlike conventional text generators that often rely on post-hoc filtering or model retraining for controllable generation, CDD directly imposes constraints into the discrete diffusion sampling process, resulting in a training-free and effective approach. Experiments in toxicity-controlled text generation, property-constrained molecule design, and instruction-constrained text completion demonstrate that CDD achieves zero constraint violations in a diverse array of tasks while preserving fluency, novelty, and coherence while outperforming autoregressive and existing discrete diffusion approaches.

Constrained Discrete Diffusion

TL;DR

This work introduces Constrained Discrete Diffusion (CDD), a training-free framework that embeds a differentiable constraint projection into the reverse process of discrete diffusion models to enforce global sequence-level constraints. By formulating projections as KL-minimizations and solving them with an Augmented Lagrangian approach (augmented with a Gumbel-Softmax relaxation for differentiability), CDD achieves zero constraint violations across toxicity mitigation, molecule design, and instruction-following tasks while maintaining fluency and coherence. Theoretical guarantees show convergence to feasible regions with bounded KL drift, and extensive experiments demonstrate state-of-the-art constraint adherence compared to autoregressive and existing discrete diffusion baselines. Overall, CDD provides a scalable, principled method for controllable generation in sensitive and domain-specific applications, without requiring retraining or post-hoc filtering.

Abstract

Discrete diffusion models are a class of generative models that construct sequences by progressively denoising samples from a categorical noise distribution. Beyond their rapidly growing ability to generate coherent natural language, these models present a new and important opportunity to enforce sequence-level constraints, a capability that current autoregressive models cannot natively provide. This paper capitalizes on this opportunity by introducing Constrained Discrete Diffusion (CDD), a novel integration of differentiable constraint optimization within the diffusion process to ensure adherence to constraints, logic rules, or safety requirements for generated sequences. Unlike conventional text generators that often rely on post-hoc filtering or model retraining for controllable generation, CDD directly imposes constraints into the discrete diffusion sampling process, resulting in a training-free and effective approach. Experiments in toxicity-controlled text generation, property-constrained molecule design, and instruction-constrained text completion demonstrate that CDD achieves zero constraint violations in a diverse array of tasks while preserving fluency, novelty, and coherence while outperforming autoregressive and existing discrete diffusion approaches.

Paper Structure

This paper contains 56 sections, 3 theorems, 51 equations, 10 figures, 5 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $\bm{C}$ be non-empty and $\beta$-prox-regular in the sense of rockafellar2009variational, and the score network satisfy $\| \nabla_{\bm{x}_t} \log q_t(\bm{x}_t) \| \leq G$ (a standard consequence of the bounded-data domain after normalization). Then, for positive step sizes $\gamma_t, \le \frac where $\bm{\upalpha}_t$ is proportional to the discrete Langevin step size $\gamma_t$ and $G$ bound

Figures (10)

  • Figure 1: Comparison of Constrained Discrete Diffusion and baseline models. CDD imposes constraints without sacrificing fluency or expressiveness.
  • Figure 2: Illustration of CDD's projection step embedded throughout the sampling process.
  • Figure 3: Results across different toxicity thresholds, PPL percentiles, LLM‑Judge coherence, and coherence degradation levels. PPL is evaluated with GPT‑2‑XL, and coherence metrics are assessed by the LLM‑Judge. Bold and underlined values denote the best and second‑best, respectively.
  • Figure 4: Left: Results for synthetic accessibility constrained molecule generation constraints. QED and constraint violations are reported for only valid molecules, and novel molecules must be valid and have no violation ($\tau \leq 3.0$). Right: Results for novelty projection with and without QED guidance. Violation represents percentage of valid, but not novel molecule generations. QED is reported for only novel molecules. Results denoted with $^\dagger$ are as reported by schiff2024simple. Bold and underlined values mark the best and second-best, respectively.
  • Figure 5: Synthetic Accessibility score distributions for CDD versus competing baselines. Importantly, CDD is the only model that never violates the specified toxicity thresholds.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Theorem 4.1: Convergence of CDD
  • proof
  • Lemma F.1
  • proof : Proof of non-expansiveness of the projection operator
  • Theorem F.2