Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions

Kaihong Zhang; Caitlyn H. Yin; Feng Liang; Jingbo Liu

Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions

Kaihong Zhang, Caitlyn H. Yin, Feng Liang, Jingbo Liu

TL;DR

This work analyzes score-based diffusion model sampling in a nonparametric, large-sample regime, removing the need for density lower bounds by assuming only sub-Gaussian data. It introduces a truncated kernel density estimator to construct a time-varying score estimator with a bound that improves as time increases, and couples this with an early stopping time to control low-density regions. Using Girsanov's theorem, the authors translate score-estimation errors into total-variation guarantees for the generated data, showing that for sub-Gaussian ground truth the diffusion sampler achieves a rate of polylog(n) n^{-1/2} t^{-d/4} in TV; when the true density lies in a Sobolev class with β ≤ 2 and with an appropriately chosen t0, the rate improves to near the classical minimax rate n^{-β/(2β+d)} (up to polylog factors). This establishes nearly minimax optimal sampling performance without the restrictive density-lower-bound assumptions of prior works, broadening the applicability of score-based diffusion models in nonparametric settings.

Abstract

We study the asymptotic error of score-based diffusion model sampling in large-sample scenarios from a non-parametric statistics perspective. We show that a kernel-based score estimator achieves an optimal mean square error of $\widetilde{O}\left(n^{-1} t^{-\frac{d+2}{2}}(t^{\frac{d}{2}} \vee 1)\right)$ for the score function of $p_0*\mathcal{N}(0,t\boldsymbol{I}_d)$, where $n$ and $d$ represent the sample size and the dimension, $t$ is bounded above and below by polynomials of $n$, and $p_0$ is an arbitrary sub-Gaussian distribution. As a consequence, this yields an $\widetilde{O}\left(n^{-1/2} t^{-\frac{d}{4}}\right)$ upper bound for the total variation error of the distribution of the sample generated by the diffusion model under a mere sub-Gaussian assumption. If in addition, $p_0$ belongs to the nonparametric family of the $β$-Sobolev space with $β\le 2$, by adopting an early stopping strategy, we obtain that the diffusion model is nearly (up to log factors) minimax optimal. This removes the crucial lower bound assumption on $p_0$ in previous proofs of the minimax optimality of the diffusion model for nonparametric families.

Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions

TL;DR

Abstract

for the score function of

, where

and

represent the sample size and the dimension,

is bounded above and below by polynomials of

, and

is an arbitrary sub-Gaussian distribution. As a consequence, this yields an

upper bound for the total variation error of the distribution of the sample generated by the diffusion model under a mere sub-Gaussian assumption. If in addition,

belongs to the nonparametric family of the

-Sobolev space with

, by adopting an early stopping strategy, we obtain that the diffusion model is nearly (up to log factors) minimax optimal. This removes the crucial lower bound assumption on

in previous proofs of the minimax optimality of the diffusion model for nonparametric families.

Paper Structure (44 sections, 37 theorems, 306 equations, 1 figure, 1 algorithm)

This paper contains 44 sections, 37 theorems, 306 equations, 1 figure, 1 algorithm.

Introduction
Main Contributions
Prior Works
Organization
Background
Forward and Backward Processes
Sampling Method
Connections with Ornstein–Uhlenbeck Process
Main Results
Notation
Assumptions
Analysis of Score Estimation Error
Analysis of Estimation Error for Diffusion Model
Proof Overview
Proof sketch of Theorem \ref{['theorem2']}
...and 29 more sections

Key Result

Theorem 1.1

Suppose that $p_0$ satisfies assumption A1, and $C>0$ is arbitrary. There exists a score estimator $\hat{s}_t(x)$ ($t>t_0$) such that for the early stopping time $t_0=n^{-C}$, The distribution of sample from Algorithm algorithm1 differs from $p_{t_0}$ by at most ${\rm polylog}(n) n^{-1/2}{t_0}^{-\fr

Figures (1)

Figure 1: Convergence Rates

Theorems & Definitions (74)

Theorem 1.1: Informal; see Theorem \ref{['theorem2']} and Theorem \ref{['main_theorem2']}
Definition 3.2: Sub-Gaussian random vectors, vershynin_2018
Remark 3.3
Theorem 3.5
Remark 3.6
Corollary 3.7
Theorem 3.8
Remark 3.9
Remark 3.10
Proposition 4.1: See also Proposition \ref{['MSEphat']}, \ref{['MSEphatprime']}
...and 64 more

Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions

TL;DR

Abstract

Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (74)