O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

Gen Li; Yuling Yan

O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

Gen Li, Yuling Yan

TL;DR

<3-5 sentence high-level summary>This work provides a fast, TV-distance convergence theory for DDPM samplers under minimal data assumptions, showing that, with only ell2-score estimation accuracy and finite first-order moment, the generated distribution converges to the target at O(d/T) in TV distance (up to log factors). It further shows that by using a specialized coefficient design, DDPM can adapt to unknown low-dimensional structure and achieve the sharper O(k/T) rate, where k is the intrinsic dimension. The analysis introduces a fine-grained, step-by-step error propagation framework that tracks discretization and estimation errors through the reverse process, avoiding reliance on KL bounds alone. Together, these results unify fast convergence with robustness to model misspecification and unknown data geometry, marking a significant advance in the theoretical understanding of diffusion-based generative modeling.

Abstract

Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for the denoising diffusion probabilistic model (DDPM), a widely used SDE-based sampler, under minimal assumptions. Our analysis shows that, provided $\ell_{2}$-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by $O(d/T)$ (ignoring logarithmic factors), where $d$ is the data dimensionality and $T$ is the number of steps. This result holds for any target distribution with finite first-order moment. Moreover, we show that with careful coefficient design, the convergence rate improves to $O(k/T)$, where $k$ is the intrinsic dimension of the target data distribution. This highlights the ability of DDPM to automatically adapt to unknown low-dimensional structures, a common feature of natural image distributions. These results are achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.

O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

TL;DR

Abstract

-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by

(ignoring logarithmic factors), where

is the data dimensionality and

is the number of steps. This result holds for any target distribution with finite first-order moment. Moreover, we show that with careful coefficient design, the convergence rate improves to

, where

is the intrinsic dimension of the target data distribution. This highlights the ability of DDPM to automatically adapt to unknown low-dimensional structures, a common feature of natural image distributions. These results are achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.

Paper Structure (33 sections, 20 theorems, 222 equations, 2 tables)

This paper contains 33 sections, 20 theorems, 222 equations, 2 tables.

Introduction
Problem set-up
Forward process.
Reverse process.
Main results
General theory: an $O(d/T)$ convergence bound
Adapting to unknown low-dimensional structure
Proof of Theorem \ref{['thm:main']}
Preliminaries
Step 1: introducing auxiliary sequences
Step 2: controlling discretization error
Step 3: controlling estimation error
Proof of Theorem \ref{['thm:main-low-d']}
Preliminaries
Main proof
...and 18 more sections

Key Result

Theorem 1

Suppose that Assumption assumption:moment holds, and take the coefficients of the DDPM sampler (eq:DDPM) to be $\eta_{t}=\sigma_{t}^{2}=1-\alpha_{t}$. Then there exists some universal constant $c>0$ such that

Theorems & Definitions (39)

Theorem 1
Definition 1: Intrinsic dimension
Theorem 2
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Lemma 4
...and 29 more

O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

TL;DR

Abstract

O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (39)