Table of Contents
Fetching ...

O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

Gen Li, Yuling Yan

TL;DR

<3-5 sentence high-level summary>This work provides a fast, TV-distance convergence theory for DDPM samplers under minimal data assumptions, showing that, with only ell2-score estimation accuracy and finite first-order moment, the generated distribution converges to the target at O(d/T) in TV distance (up to log factors). It further shows that by using a specialized coefficient design, DDPM can adapt to unknown low-dimensional structure and achieve the sharper O(k/T) rate, where k is the intrinsic dimension. The analysis introduces a fine-grained, step-by-step error propagation framework that tracks discretization and estimation errors through the reverse process, avoiding reliance on KL bounds alone. Together, these results unify fast convergence with robustness to model misspecification and unknown data geometry, marking a significant advance in the theoretical understanding of diffusion-based generative modeling.

Abstract

Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for the denoising diffusion probabilistic model (DDPM), a widely used SDE-based sampler, under minimal assumptions. Our analysis shows that, provided $\ell_{2}$-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by $O(d/T)$ (ignoring logarithmic factors), where $d$ is the data dimensionality and $T$ is the number of steps. This result holds for any target distribution with finite first-order moment. Moreover, we show that with careful coefficient design, the convergence rate improves to $O(k/T)$, where $k$ is the intrinsic dimension of the target data distribution. This highlights the ability of DDPM to automatically adapt to unknown low-dimensional structures, a common feature of natural image distributions. These results are achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.

O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

TL;DR

<3-5 sentence high-level summary>This work provides a fast, TV-distance convergence theory for DDPM samplers under minimal data assumptions, showing that, with only ell2-score estimation accuracy and finite first-order moment, the generated distribution converges to the target at O(d/T) in TV distance (up to log factors). It further shows that by using a specialized coefficient design, DDPM can adapt to unknown low-dimensional structure and achieve the sharper O(k/T) rate, where k is the intrinsic dimension. The analysis introduces a fine-grained, step-by-step error propagation framework that tracks discretization and estimation errors through the reverse process, avoiding reliance on KL bounds alone. Together, these results unify fast convergence with robustness to model misspecification and unknown data geometry, marking a significant advance in the theoretical understanding of diffusion-based generative modeling.

Abstract

Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for the denoising diffusion probabilistic model (DDPM), a widely used SDE-based sampler, under minimal assumptions. Our analysis shows that, provided -accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by (ignoring logarithmic factors), where is the data dimensionality and is the number of steps. This result holds for any target distribution with finite first-order moment. Moreover, we show that with careful coefficient design, the convergence rate improves to , where is the intrinsic dimension of the target data distribution. This highlights the ability of DDPM to automatically adapt to unknown low-dimensional structures, a common feature of natural image distributions. These results are achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.
Paper Structure (33 sections, 20 theorems, 222 equations, 2 tables)

This paper contains 33 sections, 20 theorems, 222 equations, 2 tables.

Key Result

Theorem 1

Suppose that Assumption assumption:moment holds, and take the coefficients of the DDPM sampler (eq:DDPM) to be $\eta_{t}=\sigma_{t}^{2}=1-\alpha_{t}$. Then there exists some universal constant $c>0$ such that

Theorems & Definitions (39)

  • Theorem 1
  • Definition 1: Intrinsic dimension
  • Theorem 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • ...and 29 more