Table of Contents
Fetching ...

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis

Zikun Zhang, Zixiang Chen, Quanquan Gu

TL;DR

This work develops a theoretical framework for score-based discrete diffusion models in a general discrete state space ${[S]^d}$ under a continuous-time Markov chain (CTMC) forward process. It introduces a discrete-time reverse-sampling algorithm that uses score estimators at fixed time points and provides convergence guarantees in KL and total variation, with and without early stopping, using a Girsanov-based analysis. The bounds scale nearly linearly in the dimension $d$ and decompose into score-estimation error, discretization error, and truncation/mixing error, extending prior results from hypercube settings to the broader ${[S]^d}$ space. The results also connect the forward process mixing to a MLSI-based exponential convergence to the uniform distribution and establish practical guidance for discretization and score clipping, contributing to the theoretical foundations of discrete diffusion with CTMC dynamics and opening paths for acceleration and broader rate-matrix settings.

Abstract

Diffusion models have achieved great success in generating high-dimensional samples across various applications. While the theoretical guarantees for continuous-state diffusion models have been extensively studied, the convergence analysis of the discrete-state counterparts remains under-explored. In this paper, we study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework. We introduce a discrete-time sampling algorithm in the general state space $[S]^d$ that utilizes score estimators at predefined time points. We derive convergence bounds for the Kullback-Leibler (KL) divergence and total variation (TV) distance between the generated sample distribution and the data distribution, considering both scenarios with and without early stopping under reasonable assumptions. Notably, our KL divergence bounds are nearly linear in the dimension $d$, aligning with state-of-the-art results for diffusion models. Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function, which are essential for characterizing the discrete-time sampling process.

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis

TL;DR

This work develops a theoretical framework for score-based discrete diffusion models in a general discrete state space under a continuous-time Markov chain (CTMC) forward process. It introduces a discrete-time reverse-sampling algorithm that uses score estimators at fixed time points and provides convergence guarantees in KL and total variation, with and without early stopping, using a Girsanov-based analysis. The bounds scale nearly linearly in the dimension and decompose into score-estimation error, discretization error, and truncation/mixing error, extending prior results from hypercube settings to the broader space. The results also connect the forward process mixing to a MLSI-based exponential convergence to the uniform distribution and establish practical guidance for discretization and score clipping, contributing to the theoretical foundations of discrete diffusion with CTMC dynamics and opening paths for acceleration and broader rate-matrix settings.

Abstract

Diffusion models have achieved great success in generating high-dimensional samples across various applications. While the theoretical guarantees for continuous-state diffusion models have been extensively studied, the convergence analysis of the discrete-state counterparts remains under-explored. In this paper, we study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework. We introduce a discrete-time sampling algorithm in the general state space that utilizes score estimators at predefined time points. We derive convergence bounds for the Kullback-Leibler (KL) divergence and total variation (TV) distance between the generated sample distribution and the data distribution, considering both scenarios with and without early stopping under reasonable assumptions. Notably, our KL divergence bounds are nearly linear in the dimension , aligning with state-of-the-art results for diffusion models. Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function, which are essential for characterizing the discrete-time sampling process.
Paper Structure (25 sections, 18 theorems, 108 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 25 sections, 18 theorems, 108 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Proposition 1

Let $P^i_{s,t}\in\mathbb{R}^{S\times S}$ be the transition probability matrix of the $i$-th dimensional forward CTMC $X^i$ from time $s$ to time $t$, i.e., $P^i_{s,t}(x,y)=q^i_{t|s}(y|x)$ for all $x,y\in [S]$ and $i\in [d]$. Then for all $i\in [d]$, $P^i_{s,t}\equiv P^0_{s,t}$ where Let $P_{s,t}\in\mathbb{R}^{S^d\times S^d}$ be the transition probability matrix of the forward process from time $s

Figures (1)

  • Figure 1: Illustration of the processes $Y$, $Z$ and $\tilde{Z}$. $Y$ is the true reverse process, $Z$ is the sampling process starting from the noise $\pi^d$, and $\tilde{Z}$ is the same as $Z$ except for the initialization. Both $Y$ and $\tilde{Z}$ start from $q_T$. The random variables (e.g., $Y_t, Z_t$) are shown with their corresponding probability laws in parentheses (e.g., $q_{T-t}, p_t$).

Theorems & Definitions (19)

  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Corollary 2
  • Proposition 2
  • Lemma 1
  • Lemma 2: Score bound
  • Lemma 3: Score movement bound
  • Lemma 4: Score bound
  • ...and 9 more