A Good Score Does not Lead to A Good Generative Model

Sixu Li; Shi Chen; Qin Li

A Good Score Does not Lead to A Good Generative Model

Sixu Li, Shi Chen, Qin Li

TL;DR

This paper challenges the common belief that a well-learned score function guarantees genuinely generative samples in Score-based Generative Models (SGMs). By constructing a toy KDE-based argument and proving new non-asymptotic results, it shows that an empirical optimal score can render a DDPM that effectively operates as a Gaussian KDE, producing memorized replicas rather than novel samples. The authors establish explicit sample-complexity bounds for the empirical score and prove that, under those conditions, the resulting generator behaves like KDE, highlighting memorization as a fundamental limitation of current SGMs. Through numerical experiments on a 2D Gaussian and CIFAR-10, they illustrate the gap between distributional closeness and generation creativity, arguing for theoretical criteria that explicitly quantify generative novelty alongside imitation. The work underscores the need for new convergence notions that assess not only how close generated distributions are to the target but also how effectively SGMs can produce new, diverse samples.

Abstract

Score-based Generative Models (SGMs) is one leading method in generative modeling, renowned for their ability to generate high-quality samples from complex, high-dimensional data distributions. The method enjoys empirical success and is supported by rigorous theoretical convergence properties. In particular, it has been shown that SGMs can generate samples from a distribution that is close to the ground-truth if the underlying score function is learned well, suggesting the success of SGM as a generative model. We provide a counter-example in this paper. Through the sample complexity argument, we provide one specific setting where the score function is learned well. Yet, SGMs in this setting can only output samples that are Gaussian blurring of training data points, mimicking the effects of kernel density estimation. The finding resonates a series of recent finding that reveal that SGMs can demonstrate strong memorization effect and fail to generate.

A Good Score Does not Lead to A Good Generative Model

TL;DR

Abstract

Paper Structure (19 sections, 23 theorems, 91 equations, 5 figures)

This paper contains 19 sections, 23 theorems, 91 equations, 5 figures.

Introduction
A toy model argument
Contributions
Literature review
Score-based Generative Models
Mathematical foundation for DDPM
Score-function, explicit solution and score matching
Error analysis for DDPM
A good score estimate: sample complexity analysis
A bad SGM: memorization Effects
Numerical Experiments
Discussion and Conclusion
Notations
Empirical optimal score function
Approximation error of empirical optimal score function
...and 4 more sections

Key Result

Theorem 2.3

Suppose the Assumptions assum: bounded second moment and assum: score estimation error hold and $T \geq 1$, $\delta > 0$. Let $\bar{\mathsf{q}}_{T-\delta}$ be the output of the DDPM algorithm eqn: implementable DDPM process at time $T-\delta$. Then it holds that

Figures (5)

Figure 1: Images generated based on CIFAR10 dataset. The first row shows the original images, the second row presents the images blurred according to the Gaussian KDE, and the third row shows images generated by SGM equipped with the perfect score function learned from samples. Both KDE and SGM present simple replica (with Gaussian blurring) of the original images.
Figure 2: Score approximation error of the empirical optimal score function defined in \ref{['eqn: score approximation error full error']} versus the number of training samples $N$. Both $x$-axis and $y$-axis are in the logarithmic scales. The orange crosses represent the score approximation error for varying values of $N$, with a fitted blue trend line. Reference lines with a slope of $-1$ are depicted by the green dashed lines, illustrating that the slope of the blue line is also approximately $-1$. This observation corroborates the rate $O(\frac{1}{N})$ provided in Theorem \ref{['thm: approximation error of empirical optimal score function']}.
Figure 3: Left: Samples generated by DDPM with empirical optimal score function$s^N(t,x)$. Right: Samples generated by DDPM with true score function$u(t,x)$. In both plots, the blue crosses are the training samples, the green dots are the initialization positions and the orange dots are the outputs of DDPM with early stop of $\delta=0.01$.
Figure 4: Left: Samples generated by DDPM with empirical optimal score function$s^N(t,x)$. Right: Samples generated by DDPM with true score function$u(t,x)$. Both two algorithms are ran up to time $T = 5$, i.e. early stopping time $\delta = 0$. The blue crosses are the training samples, the green dots are the initialization positions and the orange points are the generated samples.
Figure 5: Left: Samples generated by DDPM with empirical optimal score function$s^N(t,x)$. Right: Samples generated by DDPM with true score function$u(t,x)$. Both two algorithms are early stopped with $\delta = 0.01$. The blue crosses are the training samples, the green dots are the initialization positions and the orange points are the generated samples.

Theorems & Definitions (43)

Theorem 2.3: Modified version of Theorem 1 in benton2023linear
Theorem 3.1: Approximation error of empirical optimal score function
Remark 3.2
proof : Sketch of proof
Proposition 4.1
Proposition 4.2
Theorem 4.3: SGM with empirical optimal score function resembles KDE
Lemma 2.1
proof
Lemma 2.2
...and 33 more

A Good Score Does not Lead to A Good Generative Model

TL;DR

Abstract

A Good Score Does not Lead to A Good Generative Model

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (43)