Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis
Zikun Zhang, Zixiang Chen, Quanquan Gu
TL;DR
This work develops a theoretical framework for score-based discrete diffusion models in a general discrete state space ${[S]^d}$ under a continuous-time Markov chain (CTMC) forward process. It introduces a discrete-time reverse-sampling algorithm that uses score estimators at fixed time points and provides convergence guarantees in KL and total variation, with and without early stopping, using a Girsanov-based analysis. The bounds scale nearly linearly in the dimension $d$ and decompose into score-estimation error, discretization error, and truncation/mixing error, extending prior results from hypercube settings to the broader ${[S]^d}$ space. The results also connect the forward process mixing to a MLSI-based exponential convergence to the uniform distribution and establish practical guidance for discretization and score clipping, contributing to the theoretical foundations of discrete diffusion with CTMC dynamics and opening paths for acceleration and broader rate-matrix settings.
Abstract
Diffusion models have achieved great success in generating high-dimensional samples across various applications. While the theoretical guarantees for continuous-state diffusion models have been extensively studied, the convergence analysis of the discrete-state counterparts remains under-explored. In this paper, we study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework. We introduce a discrete-time sampling algorithm in the general state space $[S]^d$ that utilizes score estimators at predefined time points. We derive convergence bounds for the Kullback-Leibler (KL) divergence and total variation (TV) distance between the generated sample distribution and the data distribution, considering both scenarios with and without early stopping under reasonable assumptions. Notably, our KL divergence bounds are nearly linear in the dimension $d$, aligning with state-of-the-art results for diffusion models. Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function, which are essential for characterizing the discrete-time sampling process.
