Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees
Yuchen Liang, Yingbin Liang, Lifeng Lai, Ness Shroff
TL;DR
This work addresses the theoretical understanding of discrete diffusion models by introducing a Girsanov-free, differential-inequality framework that analyzes the rate of change of the KL divergence between the true posterior and sampling distributions via the Kolmogorov equations. It removes strong regularity assumptions that plagued prior analyses and proves that the standard $\tau$-leaping sampler achieves convergence with a linear dependence on the vocabulary size $S$, improving over previous quadratic bounds. The framework also yields the first convergence guarantees for practical deterministic-step samplers, including the Euler method and Tweedie $\tau$-leaping, by constructing an asymptotically equivalent approximate sampler. Collectively, the results offer tighter, more broadly applicable guarantees and suggest practical step-size strategies with favorable scaling for large vocabularies, which is especially impactful for NLP and graph-based discrete data tasks. The methods and corollaries presented can influence future analyses of stochastic processes beyond discrete diffusion models.$
Abstract
Discrete diffusion models have recently gained significant prominence in applications involving natural language and graph data. A key factor influencing their effectiveness is the efficiency of discretized samplers. Among these, $τ$-leaping samplers have become particularly popular due to their theoretical and empirical success. However, existing theoretical analyses of $τ$-leaping often rely on somewhat restrictive and difficult-to-verify regularity assumptions, and their convergence bounds contain quadratic dependence on the vocabulary size. In this work, we introduce a new analytical approach for discrete diffusion models that removes the need for such assumptions. For the standard $τ$-leaping method, we establish convergence guarantees in KL divergence that scale linearly with vocabulary size, improving upon prior results with quadratic dependence. Our approach is also more broadly applicable: it provides the first convergence guarantees for other widely used samplers, including the Euler method and Tweedie $τ$-leaping. Central to our approach is a novel technique based on differential inequalities, offering a more flexible alternative to the traditional Girsanov change-of-measure methods. This technique may also be of independent interest for the analysis of other stochastic processes.
