Table of Contents
Fetching ...

Convergence Analysis of Discrete Diffusion Model: Exact Implementation through Uniformization

Hongrui Chen, Lexing Ying

TL;DR

This work develops a theory for discrete diffusion models implemented via continuous-time Markov chains on a finite state space, focusing on an exact reversed sampling algorithm achieved through uniformization. It establishes TV and KL guarantees under an ε-accurate score-entropy loss and demonstrates near-linear scalability in the dimension d, addressing key discretization challenges that plague SDE-based approaches. Centered on an independent-flips forward process on the hypercube, the paper derives explicit transition kernels and provides a concrete, adaptive-λ algorithm with quantified complexity and error bounds. The results position discrete diffusion as a competitive alternative to continuous diffusion for intrinsically discrete data, and outline avenues for faster algorithms and richer forward-process designs.

Abstract

Diffusion models have achieved huge empirical success in data generation tasks. Recently, some efforts have been made to adapt the framework of diffusion models to discrete state space, providing a more natural approach for modeling intrinsically discrete data, such as language and graphs. This is achieved by formulating both the forward noising process and the corresponding reversed process as Continuous Time Markov Chains (CTMCs). In this paper, we investigate the theoretical properties of the discrete diffusion model. Specifically, we introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points. Under reasonable assumptions on the learning of the discrete score function, we derive Total Variation distance and KL divergence guarantees for sampling from any distribution on a hypercube. Our results align with state-of-the-art achievements for diffusion models in $\mathbb{R}^d$ and further underscore the advantages of discrete diffusion models in comparison to the $\mathbb{R}^d$ setting.

Convergence Analysis of Discrete Diffusion Model: Exact Implementation through Uniformization

TL;DR

This work develops a theory for discrete diffusion models implemented via continuous-time Markov chains on a finite state space, focusing on an exact reversed sampling algorithm achieved through uniformization. It establishes TV and KL guarantees under an ε-accurate score-entropy loss and demonstrates near-linear scalability in the dimension d, addressing key discretization challenges that plague SDE-based approaches. Centered on an independent-flips forward process on the hypercube, the paper derives explicit transition kernels and provides a concrete, adaptive-λ algorithm with quantified complexity and error bounds. The results position discrete diffusion as a competitive alternative to continuous diffusion for intrinsically discrete data, and outline avenues for faster algorithms and richer forward-process designs.

Abstract

Diffusion models have achieved huge empirical success in data generation tasks. Recently, some efforts have been made to adapt the framework of diffusion models to discrete state space, providing a more natural approach for modeling intrinsically discrete data, such as language and graphs. This is achieved by formulating both the forward noising process and the corresponding reversed process as Continuous Time Markov Chains (CTMCs). In this paper, we investigate the theoretical properties of the discrete diffusion model. Specifically, we introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points. Under reasonable assumptions on the learning of the discrete score function, we derive Total Variation distance and KL divergence guarantees for sampling from any distribution on a hypercube. Our results align with state-of-the-art achievements for diffusion models in and further underscore the advantages of discrete diffusion models in comparison to the setting.
Paper Structure (17 sections, 8 theorems, 44 equations, 1 algorithm)

This paper contains 17 sections, 8 theorems, 44 equations, 1 algorithm.

Key Result

Proposition 1

Consider a general CTMC on a finite state space $\mathcal{X}$ with the generator $Q(t)$. Let $p(t)$ be the distribution of the CTMC at time $t$. Suppose $Q_{x,x}(t) \le \lambda$ for any $x \in \mathcal{X}$ and $0\le t \le T$. Let $(\tilde{P}(t))_{t \ge 0}$ be the transition kernels given by kernel.

Theorems & Definitions (14)

  • Proposition 1: Uniformization of CTMC
  • Proposition 2
  • Proposition 3
  • proof
  • Proposition 4
  • Proposition 5
  • proof
  • Theorem 6
  • Remark 1
  • Remark 2
  • ...and 4 more