Optimal score estimation via empirical Bayes smoothing

Andre Wibisono; Yihong Wu; Kaylee Yingxi Yang

Optimal score estimation via empirical Bayes smoothing

Andre Wibisono, Yihong Wu, Kaylee Yingxi Yang

TL;DR

The paper addresses the problem of estimating the score function of an unknown distribution in $\mathbb{R}^d$ from $n$ i.i.d. samples, assuming $\rho^*$ is $\alpha$-subgaussian with a Lipschitz score. It introduces an empirical Bayes smoothing approach via a Gaussian-kernel KDE with regularization and proves that the optimal minimax rate under the $L^2(\rho^*)$ score-matching loss is $\widetilde{\Theta}(n^{-2/(d+4)})$ up to logarithmic factors, with a matching lower bound establishing optimality. The estimator uses $\hat s^{\varepsilon}_h(x) = \nabla \hat\rho_h(x)/\max(\hat\rho_h(x), \varepsilon)$, with bandwidth $h$ and regularization $\varepsilon$ chosen to balance bias and variance; extensions to $\beta$-Hölder scores ($\beta\le 1$) are provided, yielding rate $n^{-2\beta/(d+2\beta+2)}$. The work connects to empirical Bayes and smoothed empirical distributions to bound errors via Hellinger distance and discusses implications for SGMs, including how the forward OU process and DDPM guarantees translate the score-estimation rate into sampling performance, revealing fundamental sample-complexity limits in high dimensions.

Abstract

We study the problem of estimating the score function of an unknown probability distribution $ρ^*$ from $n$ independent and identically distributed observations in $d$ dimensions. Assuming that $ρ^*$ is subgaussian and has a Lipschitz-continuous score function $s^*$, we establish the optimal rate of $\tilde Θ(n^{-\frac{2}{d+4}})$ for this estimation problem under the loss function $\|\hat s - s^*\|^2_{L^2(ρ^*)}$ that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension $d$. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss extensions to estimating $β$-Hölder continuous scores with $β\leq 1$, as well as the implication of our theory on the sample complexity of score-based generative models.

Optimal score estimation via empirical Bayes smoothing

TL;DR

The paper addresses the problem of estimating the score function of an unknown distribution in

from

i.i.d. samples, assuming

-subgaussian with a Lipschitz score. It introduces an empirical Bayes smoothing approach via a Gaussian-kernel KDE with regularization and proves that the optimal minimax rate under the

score-matching loss is

up to logarithmic factors, with a matching lower bound establishing optimality. The estimator uses

, with bandwidth

and regularization

chosen to balance bias and variance; extensions to

-Hölder scores (

) are provided, yielding rate

. The work connects to empirical Bayes and smoothed empirical distributions to bound errors via Hellinger distance and discusses implications for SGMs, including how the forward OU process and DDPM guarantees translate the score-estimation rate into sampling performance, revealing fundamental sample-complexity limits in high dimensions.

Abstract

We study the problem of estimating the score function of an unknown probability distribution

from

independent and identically distributed observations in

dimensions. Assuming that

is subgaussian and has a Lipschitz-continuous score function

, we establish the optimal rate of

for this estimation problem under the loss function

that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension

. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss extensions to estimating

-Hölder continuous scores with

, as well as the implication of our theory on the sample complexity of score-based generative models.

Paper Structure (30 sections, 18 theorems, 128 equations)

This paper contains 30 sections, 18 theorems, 128 equations.

Introduction
Main idea
Related work
Empirical Bayes.
Density Estimation.
Score Estimation.
Score-based Generative Models.
Notations and definitions
Main results
Score estimator via Empirical Bayes smoothing
First term:
Second term:
Third term:
Combining the bounds.
Minimax lower bound
...and 15 more sections

Key Result

Theorem 1

Let $d \ge 1$ be fixed, and suppose we have $X_1,\dots,X_n$ drawn i.i.d. from some $\rho^\ast \in \mathcal{P}_{\alpha,L}$. Setting for sufficiently large $n$, the score estimator Eq:KernelEst satisfies where $\ell(\cdot, \cdot)$ is defined in eq:LossFunction, and $C >0$ is a universal constant.

Theorems & Definitions (35)

Theorem 1
proof
Theorem 2
Theorem 3
Proposition 1
Theorem 4
Corollary 1
Proposition 2: SG2020
Lemma 1
proof
...and 25 more

Optimal score estimation via empirical Bayes smoothing

TL;DR

Abstract

Optimal score estimation via empirical Bayes smoothing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (35)