Convergence of score-based generative modeling for general data distributions

Holden Lee; Jianfeng Lu; Yixin Tan

Convergence of score-based generative modeling for general data distributions

Holden Lee, Jianfeng Lu, Yixin Tan

TL;DR

This work delivers polynomial-time convergence guarantees for score-based diffusion models, notably DDPMs, under minimal data assumptions. By combining a refined $L^2$-to-$L^\infty$ analysis with a KL-bound that avoids global log-Sobolev inequalities, plus a perturbation-informed link between data distribution shifts and score-function changes, the authors obtain Wasserstein guarantees for bounded-support (and light-tailed) data and TV guarantees under smoothness. The approach relaxes strong structural assumptions that previously limited theoretical guarantees, providing a principled foundation for the empirical success of SGM on multimodal, non-smooth distributions. The results highlight the practical viability of DDPMs in realistic settings and establish a framework for analyzing score-based methods with general data distributions.

Abstract

Score-based generative modeling (SGM) has grown to be a hugely successful method for learning to generate samples from complex data distributions such as that of images and audio. It is based on evolving an SDE that transforms white noise into a sample from the learned distribution, using estimates of the score function, or gradient log-pdf. Previous convergence analyses for these methods have suffered either from strong assumptions on the data distribution or exponential dependencies, and hence fail to give efficient guarantees for the multimodal and non-smooth distributions that arise in practice and for which good empirical performance is observed. We consider a popular kind of SGM -- denoising diffusion models -- and give polynomial convergence guarantees for general data distributions, with no assumptions related to functional inequalities or smoothness. Assuming $L^2$-accurate score estimates, we obtain Wasserstein distance guarantees for any distribution of bounded support or sufficiently decaying tails, as well as TV guarantees for distributions with further smoothness assumptions.

Convergence of score-based generative modeling for general data distributions

TL;DR

This work delivers polynomial-time convergence guarantees for score-based diffusion models, notably DDPMs, under minimal data assumptions. By combining a refined

-to-

analysis with a KL-bound that avoids global log-Sobolev inequalities, plus a perturbation-informed link between data distribution shifts and score-function changes, the authors obtain Wasserstein guarantees for bounded-support (and light-tailed) data and TV guarantees under smoothness. The approach relaxes strong structural assumptions that previously limited theoretical guarantees, providing a principled foundation for the empirical success of SGM on multimodal, non-smooth distributions. The results highlight the practical viability of DDPMs in realistic settings and establish a framework for analyzing score-based methods with general data distributions.

Abstract

-accurate score estimates, we obtain Wasserstein distance guarantees for any distribution of bounded support or sufficiently decaying tails, as well as TV guarantees for distributions with further smoothness assumptions.

Paper Structure (17 sections, 30 theorems, 177 equations, 1 algorithm)

This paper contains 17 sections, 30 theorems, 177 equations, 1 algorithm.

Introduction
Problem setting
Prior work on convergence guarantees
Our contributions
Main results
Proof overview
DDPM with $L^\infty$-accurate score estimate
Auxiliary bounds
Bounding the KL divergence
The effect of perturbing the data distribution on the score function
Perturbation under $\chi^2$ error and truncation
Perturbation under TV error
Gaussian tail calculation
Guarantees under $L^2$-accurate score estimate
TV error guarantees
...and 2 more sections

Key Result

Theorem 2.1

Suppose that Assumption a:score and a:bd hold with $R\ge \sqrt d$. Then there is a sequence of discretization points $0=t_0<t_1<\cdots <t_N<T$ with $N=O(\operatorname{poly}(d,R,1/\varepsilon_{\operatorname{TV}},1/\varepsilon_{\textup{W}}))$ such that if $\varepsilon_\sigma = \widetilde{O}\left( {\fr

Theorems & Definitions (58)

Theorem 2.1: Wasserstein+TV error for distributions with bounded support
Theorem 2.2: Wasserstein error for distributions with bounded support
Theorem 2.3: TV error for distributions under smoothness assumption
Lemma 4.1: cf. erdogdu2021convergence, lee2022convergence
proof
Lemma 4.2
proof
Lemma 4.3
proof
Lemma 4.4
...and 48 more

Convergence of score-based generative modeling for general data distributions

TL;DR

Abstract

Convergence of score-based generative modeling for general data distributions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (58)