Table of Contents
Fetching ...

Dimension-free Score Matching and Time Bootstrapping for Diffusion Models

Syamantak Kumar, Dheeraj Nagaraj, Purnamrita Sarkar

TL;DR

This work addresses the dimension-dependent sample complexity of learning score functions for diffusion models. It introduces a joint-time learning approach using a single function class across noise levels, supported by a novel martingale-based error decomposition and sharp variance bounds, to achieve nearly dimension-free generalization up to a $\log\log(d)$ factor. A key contribution is Bootstrapped Score Matching (BSM), which reduces variance across timesteps by bootstrapping targets from previous scores. Collectively, these results explain the observed efficiency of diffusion-model training in high dimensions and offer a practical variance-reduction technique, with potential extensions to flow-matching frameworks.

Abstract

Diffusion models generate samples by estimating the score function of the target distribution at various noise levels. The model is trained using samples drawn from the target distribution by progressively adding noise. Previous sample complexity bounds have polynomial dependence on the dimension $d$, apart from a $\log(|\mathcal{H}|)$ term, where $\mathcal{H}$ is the hypothesis class. In this work, we establish the first (nearly) dimension-free sample complexity bounds, modulo the $\log(|\mathcal{H}|)$ dependence, for learning these score functions, achieving a double exponential improvement in the dimension over prior results. A key aspect of our analysis is the use of a single function approximator to jointly estimate scores across noise levels, a practical feature that enables generalization across time steps. We introduce a martingale-based error decomposition and sharp variance bounds, enabling efficient learning from dependent data generated by Markov processes, which may be of independent interest. Building on these insights, we propose Bootstrapped Score Matching (BSM), a variance reduction technique that leverages previously learned scores to improve accuracy at higher noise levels. These results provide insights into the efficiency and effectiveness of diffusion models for generative modeling.

Dimension-free Score Matching and Time Bootstrapping for Diffusion Models

TL;DR

This work addresses the dimension-dependent sample complexity of learning score functions for diffusion models. It introduces a joint-time learning approach using a single function class across noise levels, supported by a novel martingale-based error decomposition and sharp variance bounds, to achieve nearly dimension-free generalization up to a factor. A key contribution is Bootstrapped Score Matching (BSM), which reduces variance across timesteps by bootstrapping targets from previous scores. Collectively, these results explain the observed efficiency of diffusion-model training in high dimensions and offer a practical variance-reduction technique, with potential extensions to flow-matching frameworks.

Abstract

Diffusion models generate samples by estimating the score function of the target distribution at various noise levels. The model is trained using samples drawn from the target distribution by progressively adding noise. Previous sample complexity bounds have polynomial dependence on the dimension , apart from a term, where is the hypothesis class. In this work, we establish the first (nearly) dimension-free sample complexity bounds, modulo the dependence, for learning these score functions, achieving a double exponential improvement in the dimension over prior results. A key aspect of our analysis is the use of a single function approximator to jointly estimate scores across noise levels, a practical feature that enables generalization across time steps. We introduce a martingale-based error decomposition and sharp variance bounds, enabling efficient learning from dependent data generated by Markov processes, which may be of independent interest. Building on these insights, we propose Bootstrapped Score Matching (BSM), a variance reduction technique that leverages previously learned scores to improve accuracy at higher noise levels. These results provide insights into the efficiency and effectiveness of diffusion models for generative modeling.

Paper Structure

This paper contains 18 sections, 51 theorems, 213 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

Let Assumption assumption:score_function_smoothness hold. Fix $\delta \in \left(0,1\right)$. For all $j \in [N]$, let $t_{j} := \Delta j$ and $\gamma_{j} := \Delta$. Let $B := C\log\left(\left(L+1\right)dmN\log\left(\frac{1}{\delta}\right)/\Delta\right)$ for an absolute constant $C > 0$, and let $\D with probability at least $1-\delta$,

Figures (3)

  • Figure 1: (a) Empirical L2 error (\ref{['eq:dsm_total']}), scaled inversely by $\log\left(|\mathcal{H}|\right)\log\log\left(d\right)$, on a $\log-\log$ scale. A linear fit to the points shows a nearly zero slope, consistent with our $\log \log d$ dimension dependence. (b) Comparison of scaled empirical L2 error, vs. the scaled error if there were a linear dimension dependence as in prior works. As discussed subsequently, all previous works provide scaled error bounds with atleast a linear dependence.
  • Figure 2: Dependency graph of the key lemmas leading to Theorem \ref{['theorem:empirical_l2_error_bound']}.
  • Figure 3: Experiments with Bootstrapped Score Matching. (a) represents the L2 error at each timestep while performing score estimation for a multivariate Gaussian density. In this case, since the score function is linear, \ref{['eq:dsm_total']} can be solved exactly without a neural network. We note that BSM significantly enhances the quality of the score function. (b) explores multimodal densities, specifically a mixture of Gaussians. Here, we use a 3-layer neural network to represent the score function and plot the empirical density learned by using \ref{['eq:reverse_sde']} with different score estimation algorithms. We note that using score bootstrapping significantly enhances the proportional representation of the minor mode, leading to a fair output. We provide details of the experimental setup in the Appendix Section \ref{['appendix:bsm']}.

Theorems & Definitions (98)

  • Definition 1: $\left(\beta^2,K\right)$-subGaussianity
  • Theorem 1: Empirical $L_2$ Bound
  • Remark 1
  • Remark 2
  • Remark 3
  • Theorem 2: $L_2$ Error Bound
  • Remark 4
  • Theorem 3: Fast Inference
  • Lemma 1
  • Lemma 2
  • ...and 88 more