Table of Contents
Fetching ...

On the Interpolation Effect of Score Smoothing in Diffusion Models

Zhengdao Chen

TL;DR

The paper examines how score smoothing, induced by regularization in neural score estimators, biases diffusion-model denoising toward interpolation along a training subspace rather than memorization. It develops a theoretical model in 1D showing that regularized two-layer ReLU networks learn a Smoothed PL-ESF, with a variational justification that near-minimizers have $\delta_t \propto \sqrt{t}$, and derives analytic flow dynamics for the resulting denoising process. Extending to higher dimensions, the analysis reveals a tangent-normal decomposition where smoothing preserves interpolation along the subspace while normal directions shrink, enabling subspace recovery without memorization and contrasting with naive early stopping. Numerical experiments corroborate the interpolation effect, showing NN-learned SF closely matches Smoothed PL-ESF and yields interpolating samples on linear and circular manifolds, even with implicit regularization. Overall, the work provides a mechanistic view of how score smoothing under NN training can endow diffusion models with generalization and creativity beyond the training data, guiding future design of score estimators and regularization strategies.

Abstract

Score-based diffusion models have achieved remarkable progress in various domains with the ability to generate new data samples that do not exist in the training set. In this work, we study the hypothesis that such creativity arises from an interpolation effect caused by a smoothing of the empirical score function. Focusing on settings where the training set lies uniformly in a one-dimensional subspace, we show theoretically how regularized two-layer ReLU neural networks tend to learn approximately a smoothed version of the empirical score function, and further probe the interplay between score smoothing and the denoising dynamics with analytical solutions and numerical experiments. In particular, we demonstrate how a smoothed score function can lead to the generation of samples that interpolate the training data along their subspace while avoiding full memorization. Moreover, we present experimental evidence that learning score functions with neural networks indeed induces a score smoothing effect, including in simple nonlinear settings and without explicit regularization.

On the Interpolation Effect of Score Smoothing in Diffusion Models

TL;DR

The paper examines how score smoothing, induced by regularization in neural score estimators, biases diffusion-model denoising toward interpolation along a training subspace rather than memorization. It develops a theoretical model in 1D showing that regularized two-layer ReLU networks learn a Smoothed PL-ESF, with a variational justification that near-minimizers have , and derives analytic flow dynamics for the resulting denoising process. Extending to higher dimensions, the analysis reveals a tangent-normal decomposition where smoothing preserves interpolation along the subspace while normal directions shrink, enabling subspace recovery without memorization and contrasting with naive early stopping. Numerical experiments corroborate the interpolation effect, showing NN-learned SF closely matches Smoothed PL-ESF and yields interpolating samples on linear and circular manifolds, even with implicit regularization. Overall, the work provides a mechanistic view of how score smoothing under NN training can endow diffusion models with generalization and creativity beyond the training data, guiding future design of score estimators and regularization strategies.

Abstract

Score-based diffusion models have achieved remarkable progress in various domains with the ability to generate new data samples that do not exist in the training set. In this work, we study the hypothesis that such creativity arises from an interpolation effect caused by a smoothing of the empirical score function. Focusing on settings where the training set lies uniformly in a one-dimensional subspace, we show theoretically how regularized two-layer ReLU neural networks tend to learn approximately a smoothed version of the empirical score function, and further probe the interplay between score smoothing and the denoising dynamics with analytical solutions and numerical experiments. In particular, we demonstrate how a smoothed score function can lead to the generation of samples that interpolate the training data along their subspace while avoiding full memorization. Moreover, we present experimental evidence that learning score functions with neural networks indeed induces a score smoothing effect, including in simple nonlinear settings and without explicit regularization.

Paper Structure

This paper contains 44 sections, 14 theorems, 85 equations, 9 figures.

Key Result

Proposition 1

Given $\epsilon \in (0, 0.015)$, for any $\kappa \geq F^{-1}(\epsilon)$, where $F$ is a computable function that decreases strictly from $1$ to $0$ on $[0, \infty)$, there exists $t_1 > 0$ (dependent on $\kappa$) such that $\hat{s}^{(n)}_{t, \delta_t}$ with $\delta_t = \kappa \sqrt{t}$ satisfies the

Figures (9)

  • Figure 1: From the noised empirical distribution ($p^{(n)}_{t_0}$; middle), denoising with the ESF ($\nabla \log p^{(n)}_t$) leads back to the empirical distribution of the training set ($p^{(n)}_{0}$; top), while using a smoothed SF (e.g. the Smoothed PL-ESF, $\hat{{\bm{s}}}^{(n)}_{t, \delta_t}$; or NN-learned SF) produces a distribution that interpolates among the training set on the relevant subspace (e.g., $\hat{p}^{(n, t_0)}_0$ in the case of Smoothed PL-ESF; bottom). Definitions are given in Sections \ref{['sec:background']} - \ref{['sec:back_sm']}.
  • Figure 2: Similarities between NN-learned SF ($s^{\text{NN}}_{t, \lambda}$) under increasing strengths of regularization, $\lambda$ (left) and the Smoothed PL-ESF ($\hat{s}^{(n)}_{t, \delta}$) with decreasing values of $\delta$ (right) at a fixed $t$ in the $1$-D setting with two training data points at $\pm 1$. Detailed setup discussed in Section \ref{['app:details_exp_1']}.
  • Figure 3: Phase diagram in the $x$-$\sqrt{t}$ plane for the flow solution (\ref{['eq:phi_st']}) of the dynamics (\ref{['eq:back_sm']}) in the $d=1$, $n=2$ case analyzed in Section \ref{['sec:back_sm']}.
  • Figure 4: Results of the experiment in Section \ref{['sec:exp_score']}. Each column shows the denoising process under one of $3$ choices of SFs, which starts from the distribution $p^{(n)}_{t_0}$ at $t_0$ and evolves backward-in-time following the respective SF. At $t = t_0$, $t_0/4$ and $t_{\min}=10^{-5}$, we plot (a) the samples from the denoising processes in ${\mathbb{R}}^2$ and (b) the density histograms (log scale) of their first dimension. In (b), the colored curves are the analytical predictions of $\hat{p}^{(n, t_0)}_t$ (for $t = t_0$, $t_0/4$) and $\tilde{p}^{(n, t_0)}_0$ (for $t=t_{\min}$), with the formulas given in Appendix \ref{['app:denoise_n>2']}. A video animation of the three denoising processes and the evolution of the corresponding SFs can be found at https://www.dropbox.com/scl/fo/2dx7awl6nvajktgfovf7m/AGsUqHiFa9wvx0-RxM3HVh4?rlkey=0r1qz8ulq63cyw7xhydmcdgyd&st=hybi5oad&dl=0.
  • Figure 5: Experiment in Section \ref{['sec:exp_circle']} with training data spaced uniformly on the unit circle in ${\mathbb{R}}^2$. Top: Samples from the beginning and end of the denoising process with NN-learned SF. Middle and bottom: Visualization of the NN-learned SF vs ESF at $t = t_0 / 8$ as vector fields, with the length corresponding to the magnitude and the color determined by their angular direction (red for clockwise, blue for counter-clockwise).
  • ...and 4 more figures

Theorems & Definitions (15)

  • Proposition 1
  • Lemma 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 5 more