Table of Contents
Fetching ...

On the Generalization Properties of Diffusion Models

Puheng Li, Zhong Li, Huishuai Zhang, Jiang Bian

TL;DR

This work provides a rigorous theoretical framework for understanding the generalization of diffusion models by tying the learning dynamics to a KL-divergence-based generalization gap. It introduces a score-network construction based on random features within an RKHS, and derives data-independent bounds showing a polynomial dependence on sample size $n$ and capacity $m$, with an early-stopping regime yielding $D_{KL}(p_0 \| p_{0, \hat{\bm{\theta}}_n(\tau)}) = O(n^{-2/5}+m^{-4/5})$. It further extends the analysis to data-dependent targets exhibiting mode shift, demonstrating how increasing mode separation degrades generalization, and validates the theory with extensive numerical experiments on synthetic mixtures and MNIST. The results illuminate fundamental limits and practical guidance for training diffusion models, with implications for memorization and privacy in real-world deployments. Overall, the paper advances a principled understanding of diffusion-model generalization and offers quantitative insights that can inform practice and future theory.

Abstract

Diffusion models are a class of generative models that serve to establish a stochastic transport map between an empirically observed, yet unknown, target distribution and a known prior. Despite their remarkable success in real-world applications, a theoretical understanding of their generalization capabilities remains underdeveloped. This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models. We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models, suggesting a polynomially small generalization error ($O(n^{-2/5}+m^{-4/5})$) on both the sample size $n$ and the model capacity $m$, evading the curse of dimensionality (i.e., not exponentially large in the data dimension) when early-stopped. Furthermore, we extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities with progressively increasing distances between modes. This precisely elucidates the adverse effect of "modes shift" in ground truths on the model generalization. Moreover, these estimates are not solely theoretical constructs but have also been confirmed through numerical simulations. Our findings contribute to the rigorous understanding of diffusion models' generalization properties and provide insights that may guide practical applications.

On the Generalization Properties of Diffusion Models

TL;DR

This work provides a rigorous theoretical framework for understanding the generalization of diffusion models by tying the learning dynamics to a KL-divergence-based generalization gap. It introduces a score-network construction based on random features within an RKHS, and derives data-independent bounds showing a polynomial dependence on sample size and capacity , with an early-stopping regime yielding . It further extends the analysis to data-dependent targets exhibiting mode shift, demonstrating how increasing mode separation degrades generalization, and validates the theory with extensive numerical experiments on synthetic mixtures and MNIST. The results illuminate fundamental limits and practical guidance for training diffusion models, with implications for memorization and privacy in real-world deployments. Overall, the paper advances a principled understanding of diffusion-model generalization and offers quantitative insights that can inform practice and future theory.

Abstract

Diffusion models are a class of generative models that serve to establish a stochastic transport map between an empirically observed, yet unknown, target distribution and a known prior. Despite their remarkable success in real-world applications, a theoretical understanding of their generalization capabilities remains underdeveloped. This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models. We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models, suggesting a polynomially small generalization error () on both the sample size and the model capacity , evading the curse of dimensionality (i.e., not exponentially large in the data dimension) when early-stopped. Furthermore, we extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities with progressively increasing distances between modes. This precisely elucidates the adverse effect of "modes shift" in ground truths on the model generalization. Moreover, these estimates are not solely theoretical constructs but have also been confirmed through numerical simulations. Our findings contribute to the rigorous understanding of diffusion models' generalization properties and provide insights that may guide practical applications.
Paper Structure (31 sections, 10 theorems, 105 equations, 11 figures)

This paper contains 31 sections, 10 theorems, 105 equations, 11 figures.

Key Result

Theorem 1

Suppose that the target distribution $p_0$ is continuously differentiable and has a compact support set, i.e., $||\bm{x}||_{\infty}$ is uniformly bounded, and there exists a reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ (:=$\mathcal{H}_{k_{\rho_0}}$) such that $\bar{\bm{s}}_{0,\bar{\bm{\thet where $\lesssim$ hides the term $d \log (d+1)$, the polynomials of $\log(1/\delta^2)$, finite RKHS

Figures (11)

  • Figure 1: Illustration of the problem formulation and important notations.
  • Figure 2: An illustration of modes shift.
  • Figure 3: The KL divergence dynamics.
  • Figure 4: The training dynamics when the distance between two modes is 6 ($\mu = 3$).
  • Figure 5: The training dynamics when the distance between two modes is 30 ($\mu = 15$).
  • ...and 6 more figures

Theorems & Definitions (27)

  • Remark 1
  • Definition 1: KL divergence
  • Theorem 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Theorem 2
  • Remark 5
  • Lemma 1: Forward perturbation estimates
  • proof
  • ...and 17 more