O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions
Gen Li, Yuling Yan
TL;DR
<3-5 sentence high-level summary>This work provides a fast, TV-distance convergence theory for DDPM samplers under minimal data assumptions, showing that, with only ell2-score estimation accuracy and finite first-order moment, the generated distribution converges to the target at O(d/T) in TV distance (up to log factors). It further shows that by using a specialized coefficient design, DDPM can adapt to unknown low-dimensional structure and achieve the sharper O(k/T) rate, where k is the intrinsic dimension. The analysis introduces a fine-grained, step-by-step error propagation framework that tracks discretization and estimation errors through the reverse process, avoiding reliance on KL bounds alone. Together, these results unify fast convergence with robustness to model misspecification and unknown data geometry, marking a significant advance in the theoretical understanding of diffusion-based generative modeling.
Abstract
Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for the denoising diffusion probabilistic model (DDPM), a widely used SDE-based sampler, under minimal assumptions. Our analysis shows that, provided $\ell_{2}$-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by $O(d/T)$ (ignoring logarithmic factors), where $d$ is the data dimensionality and $T$ is the number of steps. This result holds for any target distribution with finite first-order moment. Moreover, we show that with careful coefficient design, the convergence rate improves to $O(k/T)$, where $k$ is the intrinsic dimension of the target data distribution. This highlights the ability of DDPM to automatically adapt to unknown low-dimensional structures, a common feature of natural image distributions. These results are achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.
