Table of Contents
Fetching ...

Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access

Mudit Gaur, Prashant Trivedi, Sasidhar Kunapuli, Amrit Singh Bedi, Vaneet Aggarwal

TL;DR

The paper tackles the sample complexity of training score-based diffusion models without access to exact empirical risk minimizers. By formalizing the forward–backward SDE framework, discretizing via a DDPM-style sequence, and decomposing score estimation error into approximation, statistical, and optimization components under a Polyak–Łojasiewicz–type condition, the authors derive a finite-sample bound that reduces to a total-variation target with a rate of $\tilde{\mathcal{O}}(\epsilon^{-4})$. This bound holds without ERM access and avoids exponential dependence on neural network parameters, marking a theoretical advance over prior results that required ERM. The results substantiate that diffusion models can achieve arbitrarily small distributional discrepancy with practical sample sizes, and they outline directions for extending guarantees to conditional generation and related setups.

Abstract

Diffusion models have demonstrated state-of-the-art performance across vision, language, and scientific domains. Despite their empirical success, prior theoretical analyses of the sample complexity suffer from poor scaling with input data dimension or rely on unrealistic assumptions such as access to exact empirical risk minimizers. In this work, we provide a principled analysis of score estimation, establishing a sample complexity bound of $\mathcal{O}(ε^{-4})$. Our approach leverages a structured decomposition of the score estimation error into statistical, approximation, and optimization errors, enabling us to eliminate the exponential dependence on neural network parameters that arises in prior analyses. It is the first such result that achieves sample complexity bounds without assuming access to the empirical risk minimizer of score function estimation loss.

Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access

TL;DR

The paper tackles the sample complexity of training score-based diffusion models without access to exact empirical risk minimizers. By formalizing the forward–backward SDE framework, discretizing via a DDPM-style sequence, and decomposing score estimation error into approximation, statistical, and optimization components under a Polyak–Łojasiewicz–type condition, the authors derive a finite-sample bound that reduces to a total-variation target with a rate of . This bound holds without ERM access and avoids exponential dependence on neural network parameters, marking a theoretical advance over prior results that required ERM. The results substantiate that diffusion models can achieve arbitrarily small distributional discrepancy with practical sample sizes, and they outline directions for extending guarantees to conditional generation and related setups.

Abstract

Diffusion models have demonstrated state-of-the-art performance across vision, language, and scientific domains. Despite their empirical success, prior theoretical analyses of the sample complexity suffer from poor scaling with input data dimension or rely on unrealistic assumptions such as access to exact empirical risk minimizers. In this work, we provide a principled analysis of score estimation, establishing a sample complexity bound of . Our approach leverages a structured decomposition of the score estimation error into statistical, approximation, and optimization errors, enabling us to eliminate the exponential dependence on neural network parameters that arises in prior analyses. It is the first such result that achieves sample complexity bounds without assuming access to the empirical risk minimizer of score function estimation loss.

Paper Structure

This paper contains 18 sections, 17 theorems, 173 equations, 1 algorithm.

Key Result

Theorem 1

Let $p_{t_0}$ denote the distribution obtained by the backward process till time $t_{0}$ starting form $p_{T}$, and $\hat{p}_{t_k}(x)$ be the distribution generated by the backward process at discretized time steps $\{t_k\}$ using the estimated score functions $\hat{s}_{T - t_k}(x)$ where $k \in [0, Then, with probability at least $1 - \delta$, the total variation distance between the $p_{t_0}$ an

Theorems & Definitions (29)

  • Theorem 1: Total Variation Distance Bound
  • Theorem 2: Total Variation Distance Bound Under Sub-Gaussian Assumption
  • Lemma 1: Approximation Error
  • Lemma 2: Statistical Error
  • Lemma 3: Optimization Error
  • proof
  • proof
  • Lemma 4: TV bound via Girsanov for reverse diffusions
  • proof
  • Lemma 5: Theorem 26.5 of shalev2014understanding
  • ...and 19 more