Table of Contents
Fetching ...

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

Yinuo Ren, Haoxuan Chen, Yuchen Zhu, Wei Guo, Yongxin Chen, Grant M. Rotskoff, Molei Tao, Lexing Ying

TL;DR

The paper tackles inference efficiency in discrete diffusion models by introducing two high-order solvers, θ-RK-2 and θ-Trapezoidal. It provides a rigorous stochastic-integral formulation and proves second-order convergence for the trapezoidal scheme in $D_{\mathrm{KL}}$, with conditional guarantees for the RK-2 variant. Empirical results across toy tasks, text, and image generation demonstrate superior sample quality under fixed compute budgets, generalizing from 200M to 8B parameter scales. The work paves the way for faster, more accurate discrete diffusion inference and highlights robustness and scalability as key benefits.

Abstract

Discrete diffusion models have emerged as a powerful generative modeling framework for discrete data with successful applications spanning from text generation to image synthesis. However, their deployment faces challenges due to the high dimensionality of the state space, necessitating the development of efficient inference algorithms. Current inference approaches mainly fall into two categories: exact simulation and approximate methods such as $τ$-leaping. While exact methods suffer from unpredictable inference time and redundant function evaluations, $τ$-leaping is limited by its first-order accuracy. In this work, we advance the latter category by tailoring the first extension of high-order numerical inference schemes to discrete diffusion models, enabling larger step sizes while reducing error. We rigorously analyze the proposed schemes and establish the second-order accuracy of the $θ$-Trapezoidal method in KL divergence. Empirical evaluations on GSM8K-level math-reasoning, GPT-2-level text, and ImageNet-level image generation tasks demonstrate that our method achieves superior sample quality compared to existing approaches under equivalent computational constraints, with consistent performance gains across models ranging from 200M to 8B. Our code is available at https://github.com/yuchen-zhu-zyc/DiscreteFastSolver.

Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms

TL;DR

The paper tackles inference efficiency in discrete diffusion models by introducing two high-order solvers, θ-RK-2 and θ-Trapezoidal. It provides a rigorous stochastic-integral formulation and proves second-order convergence for the trapezoidal scheme in , with conditional guarantees for the RK-2 variant. Empirical results across toy tasks, text, and image generation demonstrate superior sample quality under fixed compute budgets, generalizing from 200M to 8B parameter scales. The work paves the way for faster, more accurate discrete diffusion inference and highlights robustness and scalability as key benefits.

Abstract

Discrete diffusion models have emerged as a powerful generative modeling framework for discrete data with successful applications spanning from text generation to image synthesis. However, their deployment faces challenges due to the high dimensionality of the state space, necessitating the development of efficient inference algorithms. Current inference approaches mainly fall into two categories: exact simulation and approximate methods such as -leaping. While exact methods suffer from unpredictable inference time and redundant function evaluations, -leaping is limited by its first-order accuracy. In this work, we advance the latter category by tailoring the first extension of high-order numerical inference schemes to discrete diffusion models, enabling larger step sizes while reducing error. We rigorously analyze the proposed schemes and establish the second-order accuracy of the -Trapezoidal method in KL divergence. Empirical evaluations on GSM8K-level math-reasoning, GPT-2-level text, and ImageNet-level image generation tasks demonstrate that our method achieves superior sample quality compared to existing approaches under equivalent computational constraints, with consistent performance gains across models ranging from 200M to 8B. Our code is available at https://github.com/yuchen-zhu-zyc/DiscreteFastSolver.

Paper Structure

This paper contains 39 sections, 8 theorems, 44 equations, 8 figures, 2 tables, 3 algorithms.

Key Result

Theorem 3.1

Under a certain discretization scheme and technical assumptions, and given an $\epsilon$-accurate score function, the following error bound holds: where $\delta \ll 1$ is the early stopping time, $\kappa$ controls the step size, and $T$ is the time horizon. The notation $\lesssim$ indicates the inequality holds up to a constant factor as $\kappa \to 0$.

Figures (8)

  • Figure 1: Left: Application of the uniformization algorithm to discrete diffusion models for text generation. The $x$-axis denotes the time of the backward process, and the $y$-axis denotes the frequency of jumps (NFE). Perplexity convergence occurs before the NFE grows unbounded. Right: Comparison between $\tau$-leaping and the proposed second-order schemes ($\theta$-RK-2 and $\theta$-Trapezoidal).
  • Figure 1: Generative perplexity (on GPT-2 large) of texts generated by different sampling algorithms. Lower values are better, with the best in bold.
  • Figure 2: Empirical KL divergence between the true and generated distribution of the toy model vs. number of steps. Data are fitted with linear regression with 95% confidence interval by bootstrapping.
  • Figure 2: Response accuracy on GSM8K with different NFEs. The best results are in bold.
  • Figure 3: FID of images generated by different sampling algorithms vs. number of function evaluations (NFE). Lower values are better.
  • ...and 3 more figures

Theorems & Definitions (24)

  • Definition 2.1: Informal Definition of Poisson Random Measure
  • Theorem 3.1: Thm. 4.7 in ren2024discrete
  • Theorem 5.4: Second Order Convergence of $\theta$-Trapezoidal Method
  • Theorem 5.5: Conditional Second-Order Convergence of $\theta$-RK-2 Method
  • Remark 5.6: Comparison between Trapezoidal and RK-2 Methods
  • Remark 5.7: Remark on the Positivity of Extrapolated Intensity
  • Remark 6.1: Algorithm Hyperparameters
  • Definition B.1: Poisson Random Measure ren2024discrete
  • Definition B.2: Poisson Random Measure with Evolving Intensity ren2024discrete
  • Remark B.3: Construction of Poisson Random Measure with Evolving Intensity
  • ...and 14 more