Table of Contents
Fetching ...

On the Generalization of Diffusion Model

Mingyang Yi, Jiacheng Sun, Zhenguo Li

TL;DR

This work introduces an information-theoretic excess-risk framework to study the generalization of generative models, with a focus on diffusion models. It decomposes generalization error from optimization error and demonstrates that empirical optima under standard diffusion objectives can memorize training data when reverse updates are deterministic. The authors show that training-induced optimization bias acts as a regularizer, enabling high-quality generation with extrapolation, and they propose a forward-time objective based on estimating $\mathbb{E}[\boldsymbol{\xi}_{t,t-1}|\boldsymbol{x}_t]$ (via randomized time gaps and Tweedie’s formula) to further reduce memorization without sacrificing fidelity. Empirical results on CIFAR-10 and CelebA illustrate the trade-off between optimization and generalization and validate that the new objective yields diffusion models with improved generalization while maintaining competitive sample quality.

Abstract

The diffusion probabilistic generative models are widely used to generate high-quality data. Though they can synthetic data that does not exist in the training set, the rationale behind such generalization is still unexplored. In this paper, we formally define the generalization of the generative model, which is measured by the mutual information between the generated data and the training set. The definition originates from the intuition that the model which generates data with less correlation to the training set exhibits better generalization ability. Meanwhile, we show that for the empirical optimal diffusion model, the data generated by a deterministic sampler are all highly related to the training set, thus poor generalization. This result contradicts the observation of the trained diffusion model's (approximating empirical optima) extrapolation ability (generating unseen data). To understand this contradiction, we empirically verify the difference between the sufficiently trained diffusion model and the empirical optima. We found, though obtained through sufficient training, there still exists a slight difference between them, which is critical to making the diffusion model generalizable. Moreover, we propose another training objective whose empirical optimal solution has no potential generalization problem. We empirically show that the proposed training objective returns a similar model to the original one, which further verifies the generalization ability of the trained diffusion model.

On the Generalization of Diffusion Model

TL;DR

This work introduces an information-theoretic excess-risk framework to study the generalization of generative models, with a focus on diffusion models. It decomposes generalization error from optimization error and demonstrates that empirical optima under standard diffusion objectives can memorize training data when reverse updates are deterministic. The authors show that training-induced optimization bias acts as a regularizer, enabling high-quality generation with extrapolation, and they propose a forward-time objective based on estimating (via randomized time gaps and Tweedie’s formula) to further reduce memorization without sacrificing fidelity. Empirical results on CIFAR-10 and CelebA illustrate the trade-off between optimization and generalization and validate that the new objective yields diffusion models with improved generalization while maintaining competitive sample quality.

Abstract

The diffusion probabilistic generative models are widely used to generate high-quality data. Though they can synthetic data that does not exist in the training set, the rationale behind such generalization is still unexplored. In this paper, we formally define the generalization of the generative model, which is measured by the mutual information between the generated data and the training set. The definition originates from the intuition that the model which generates data with less correlation to the training set exhibits better generalization ability. Meanwhile, we show that for the empirical optimal diffusion model, the data generated by a deterministic sampler are all highly related to the training set, thus poor generalization. This result contradicts the observation of the trained diffusion model's (approximating empirical optima) extrapolation ability (generating unseen data). To understand this contradiction, we empirically verify the difference between the sufficiently trained diffusion model and the empirical optima. We found, though obtained through sufficient training, there still exists a slight difference between them, which is critical to making the diffusion model generalizable. Moreover, we propose another training objective whose empirical optimal solution has no potential generalization problem. We empirically show that the proposed training objective returns a similar model to the original one, which further verifies the generalization ability of the trained diffusion model.
Paper Structure (27 sections, 23 theorems, 89 equations, 11 figures)

This paper contains 27 sections, 23 theorems, 89 equations, 11 figures.

Key Result

Theorem 1

If the generated data $\boldsymbol{z}^{j}$ in eq:excess risk are conditional independent with each other, given the training set $\boldsymbol{S}$, and $\mathcal{F}$ has countable dense set under $L_{\infty}$ distance, then the excess risk eq:excess risk becomes

Figures (11)

  • Figure 1: The first figure is the averaged distance $\|\boldsymbol{x}_{t} - \boldsymbol{x}_{t}^{*}\|$ per dimension (3$\times$32$\times$32) over 50k samples of generated CIFAR10. The second figure randomly samples a batch of $\boldsymbol{x}_{t}$ and $\boldsymbol{x}_{t}^{*}$ with the same $\boldsymbol{x}_{T}=\boldsymbol{x}_{T}^{*}$ and $T=50$.
  • Figure 2: The data in the two top figures are $\boldsymbol{x}_{0}$ respectively from test and training sets of CIFAR10. The bottom are the data generated by the trained model (left) and empirical optima (right).
  • Figure 3: The comparisons of $\boldsymbol{x}_{t}, \hat{\boldsymbol{x}}_{t}, \boldsymbol{x}_{t}^{*}$, where they are respectively generated by diffusion models trained by \ref{['eq:empirical objective']}, \ref{['eq:our objective']}, and the empirical optima \ref{['eq:empirical optimal eps']}.
  • Figure 4: The generated CIFAR10, starting with noisy data constructed by training set. From the left to right are respectively the data generated by diffusion models trained by \ref{['eq:our objective']} and \ref{['eq:empirical objective']}.
  • Figure 5: The generated CelebA, starting with noisy data constructed by training set. From the left to right are respectively the data generated by diffusion models trained by \ref{['eq:our objective']} and \ref{['eq:empirical objective']}.
  • ...and 6 more figures

Theorems & Definitions (40)

  • Definition 1: Excess Risk
  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Theorem 2
  • Theorem 3
  • Proposition 3
  • Proposition 4
  • Lemma 1
  • Proposition 5
  • ...and 30 more