Table of Contents
Fetching ...

Discrete Copula Diffusion

Anji Liu, Oliver Broadrick, Mathias Niepert, Guy Van den Broeck

TL;DR

Discrete Copula Diffusion tackles the core limitation of discrete diffusion models: neglecting inter-variable dependencies between denoised outputs. By introducing an inference-time copula model and formalizing an I-projection to combine univariate diffusion marginals with a dependency structure, the method enables high-quality few-step generation. The autoregressive copula instantiation (DCD) yields substantial efficiency and quality gains in unconditional and conditional text and antibody sequence infilling. The approach highlights the central role of modeling variable dependencies in discrete diffusion and offers a practical path to faster, controllable discrete generation.

Abstract

Discrete diffusion models have recently shown significant progress in modeling complex data, such as natural languages and DNA sequences. However, unlike diffusion models for continuous data, which can generate high-quality samples in just a few denoising steps, modern discrete diffusion models still require hundreds or even thousands of denoising steps to perform well. In this paper, we identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps -- they fail to capture dependencies between output variables at each denoising step. To address this issue, we provide a formal explanation and introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model. Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps. When we apply this approach to autoregressive copula models, the combined model outperforms both models individually in unconditional and conditional text generation. Specifically, the hybrid model achieves better (un)conditional text generation using 8 to 32 times fewer denoising steps than the diffusion model alone. In addition to presenting an effective discrete diffusion generation algorithm, this paper emphasizes the importance of modeling inter-variable dependencies in discrete diffusion.

Discrete Copula Diffusion

TL;DR

Discrete Copula Diffusion tackles the core limitation of discrete diffusion models: neglecting inter-variable dependencies between denoised outputs. By introducing an inference-time copula model and formalizing an I-projection to combine univariate diffusion marginals with a dependency structure, the method enables high-quality few-step generation. The autoregressive copula instantiation (DCD) yields substantial efficiency and quality gains in unconditional and conditional text and antibody sequence infilling. The approach highlights the central role of modeling variable dependencies in discrete diffusion and offers a practical path to faster, controllable discrete generation.

Abstract

Discrete diffusion models have recently shown significant progress in modeling complex data, such as natural languages and DNA sequences. However, unlike diffusion models for continuous data, which can generate high-quality samples in just a few denoising steps, modern discrete diffusion models still require hundreds or even thousands of denoising steps to perform well. In this paper, we identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps -- they fail to capture dependencies between output variables at each denoising step. To address this issue, we provide a formal explanation and introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model. Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps. When we apply this approach to autoregressive copula models, the combined model outperforms both models individually in unconditional and conditional text generation. Specifically, the hybrid model achieves better (un)conditional text generation using 8 to 32 times fewer denoising steps than the diffusion model alone. In addition to presenting an effective discrete diffusion generation algorithm, this paper emphasizes the importance of modeling inter-variable dependencies in discrete diffusion.
Paper Structure (27 sections, 8 theorems, 53 equations, 14 figures, 1 table, 2 algorithms)

This paper contains 27 sections, 8 theorems, 53 equations, 14 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

Assume the denoising distributions $\{{p}_{\theta} (\boldsymbol{x}_{t} \vert \boldsymbol{x}_{t+1})\}_{t=0}^{T-1}$ are fully factorized. Let $\mathrm{H} ({p}(\mathbf{X}))$ denote the entropy of ${p}(\mathbf{X})$. For any choice of denoising distributions (or equivalently, any parameterization $\theta

Figures (14)

  • Figure 1: Discrete Copula Diffusion (DCD). At each denoising step, a partially completed sequence is given as input (top-left). The diffusion model independently predicts the univariate marginals for each masked token, which leads to the samples in the bottom-left. DCD introduces an additional copula model (top-right) to capture the inter-variable dependencies, thereby supplementing the information missed by the diffusion model. By combining outputs from both models in a principled way, DCD achieves better performance than either model individually (see improved samples in the bottom-right), enabling few-step discrete diffusion generation.
  • Figure 2: Illustration of the decomposition of a distribution into univariate marginals and a copula.
  • Figure 3: Generative perplexity ($\downarrow$) with different numbers of denoising steps.
  • Figure 4: Generated text from $\text{SEDD}_{\text{M}}$ and DCD with different number of steps. See \ref{['appx:additional-text-samples']} for more.
  • Figure 5: Sampling time vs. generative perplexity (the autoregressive version of DCD is used).
  • ...and 9 more figures

Theorems & Definitions (18)

  • Proposition 1
  • Definition 1
  • Proposition 2
  • Proposition 3
  • Theorem 1
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • proof : Proof of \ref{['prop:elbo-decomp']}
  • proof : Proof of \ref{['prop:iproj-is-good']}
  • ...and 8 more