A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models
Gen Li, Yuting Wei, Yuejie Chi, Yuxin Chen
TL;DR
This work provides the first non-asymptotic, nearly linear-dimension convergence guarantee for the probability flow ODE sampler in diffusion models, under discrete-time dynamics and only ℓ2-accurate score estimates. It derives a precise TV-distance bound that scales as d/ε (up to log factors) when score estimates are exact, and characterizes how score estimation errors (both in the scores and their Jacobians) propagate into sampling error. The authors introduce an elementary, non-SDE/ODE-based analysis that handles discretization directly and avoids reliance on stochastic calculus, improving prior results by achieving better dependence on dimension and accuracy while accommodating minimal assumptions on the data distribution. The framework also clarifies the necessity of controlling Jacobian errors in addition to score errors and demonstrates robustness to data distributions with polynomially large supports. Overall, the paper advances theoretical understanding of deterministic diffusion samplers and offers a path toward faster, reliable score-based generation without expensive continuous-time machinery.
Abstract
Diffusion models, which convert noise into new data instances by learning to reverse a diffusion process, have become a cornerstone in contemporary generative modeling. In this work, we develop non-asymptotic convergence theory for a popular diffusion-based sampler (i.e., the probability flow ODE sampler) in discrete time, assuming access to $\ell_2$-accurate estimates of the (Stein) score functions. For distributions in $\mathbb{R}^d$, we prove that $d/\varepsilon$ iterations -- modulo some logarithmic and lower-order terms -- are sufficient to approximate the target distribution to within $\varepsilon$ total-variation distance. This is the first result establishing nearly linear dimension-dependency (in $d$) for the probability flow ODE sampler. Imposing only minimal assumptions on the target data distribution (e.g., no smoothness assumption is imposed), our results also characterize how $\ell_2$ score estimation errors affect the quality of the data generation processes. In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach without the need of resorting to SDE and ODE toolboxes.
