Table of Contents
Fetching ...

Generative diffusion models from a PDE perspective

Fei Cao, Kimball Johnston, Thomas Laurent, Justin Le, Sébastien Motsch

TL;DR

From a PDE perspective, the paper derives the reverse diffusion equation corresponding to the forward Ornstein–Uhlenbeck process and shows that the exact reverse flow preserves the support of the original data $ρ_0$, thus cannot generate new samples. It connects the PDE derivation to the traditional SDE view, presenting the reverse SDE with a time-dependent generator and two equivalent representations: a score-based form involving $∇ \log ρ$ and a mean-field form using $\langle X_0\rangle$. The authors analyze the Dirac initial condition, providing an explicit solvable case that demonstrates how perfect reversal returns original samples (an overfitting regime) and underscore the necessity of imperfect learning for real generalization. They further discuss practical aspects via score-matching and stable diffusion, including kernel interpretations when $ρ_0$ is empirical, and argue that successful generalization relies on the interplay between reverse dynamics and model approximation (e.g., UNets) rather than the PDE reversal alone.

Abstract

Diffusion models have become the de facto framework for generating new datasets. The core of these models lies in the ability to reverse a diffusion process in time. The goal of this manuscript is to explain, from a PDE perspective, how this method works and how to derive the PDE governing the reverse dynamics as well as to study its solution analytically. By linking forward and reverse dynamics, we show that the reverse process's distribution has its support contained within the original distribution. Consequently, diffusion methods, in their analytical formulation, do not inherently regularize the original distribution, and thus, there is no generalization principle. This raises a question: where does generalization arise, given that in practice it does occur? Moreover, we derive an explicit solution to the reverse process's SDE under the assumption that the starting point of the forward process is fixed. This provides a new derivation that links two popular approaches to generative diffusion models: stable diffusion (discrete dynamics) and the score-based approach (continuous dynamics). Finally, we explore the case where the original distribution consists of a finite set of data points. In this scenario, the reverse dynamics are explicit (i.e., the loss function has a clear minimizer), and solving the dynamics fails to generate new samples: the dynamics converge to the original samples. In a sense, solving the minimization problem exactly is "too good for its own good" (i.e., an overfitting regime).

Generative diffusion models from a PDE perspective

TL;DR

From a PDE perspective, the paper derives the reverse diffusion equation corresponding to the forward Ornstein–Uhlenbeck process and shows that the exact reverse flow preserves the support of the original data , thus cannot generate new samples. It connects the PDE derivation to the traditional SDE view, presenting the reverse SDE with a time-dependent generator and two equivalent representations: a score-based form involving and a mean-field form using . The authors analyze the Dirac initial condition, providing an explicit solvable case that demonstrates how perfect reversal returns original samples (an overfitting regime) and underscore the necessity of imperfect learning for real generalization. They further discuss practical aspects via score-matching and stable diffusion, including kernel interpretations when is empirical, and argue that successful generalization relies on the interplay between reverse dynamics and model approximation (e.g., UNets) rather than the PDE reversal alone.

Abstract

Diffusion models have become the de facto framework for generating new datasets. The core of these models lies in the ability to reverse a diffusion process in time. The goal of this manuscript is to explain, from a PDE perspective, how this method works and how to derive the PDE governing the reverse dynamics as well as to study its solution analytically. By linking forward and reverse dynamics, we show that the reverse process's distribution has its support contained within the original distribution. Consequently, diffusion methods, in their analytical formulation, do not inherently regularize the original distribution, and thus, there is no generalization principle. This raises a question: where does generalization arise, given that in practice it does occur? Moreover, we derive an explicit solution to the reverse process's SDE under the assumption that the starting point of the forward process is fixed. This provides a new derivation that links two popular approaches to generative diffusion models: stable diffusion (discrete dynamics) and the score-based approach (continuous dynamics). Finally, we explore the case where the original distribution consists of a finite set of data points. In this scenario, the reverse dynamics are explicit (i.e., the loss function has a clear minimizer), and solving the dynamics fails to generate new samples: the dynamics converge to the original samples. In a sense, solving the minimization problem exactly is "too good for its own good" (i.e., an overfitting regime).

Paper Structure

This paper contains 19 sections, 13 theorems, 116 equations, 10 figures.

Key Result

Proposition 2.1

The function $\reflectbox{\vec{\reflectbox{v}}}(x,s)$eq:v_reverse satisfies the following Kolmogorov backward equation:

Figures (10)

  • Figure 1: Illustration of various generative models. Given a data sample $\{{\bf x}_i\}_{i=1}^N$ from an unknown distribution $ρ_0$, the goal is to generate new samples $\{\widetilde{\bf x}_i\}_{i=1}^M$. In kernel density estimation (A), a density $q$ is first estimated, and new samples are then drawn from this estimated distribution. In the auto-encoder approach (B), an encoder-decoder network is trained to reconstruct the samples. Diffusion models (C) work similarly to the auto-encoder method, except only the decoder must be learned, and the encoder does not reduce dimensionality.
  • Figure 2: The forward process as a PDE \ref{['eq:forward_PDE']} transforms any distribution $ρ_0$ into a normal distribution $\mathcal{N}$. The reverse process does the opposite and transforms a normal distribution $\mathcal{N}$ into (almost) $ρ_0$.
  • Figure 3: Reversing the forward process \ref{['eq:forward_PDE']} by reversing time in a naive way \ref{['eq:reverse_rho']} yields an ill-posed problem \ref{['eq:naive_pde']} even though the curve $s→ρ(x,T_*-s)$ is a solution. The true reverse dynamics \ref{['eq:reverse_diffusion']} preserves this curve but also stabilizes the solution. As a consequence, the normal distribution $\mathcal{N}$ is no longer an equilibrium distribution for the (non-autonomous) reverse dynamics.
  • Figure 4: The forward process as an SDE \ref{['eq:SDE']} transforms the initial law $\rho_0$ into a standard normal distribution $\mathcal{N}$, while the reverse process performs the opposite transformation.
  • Figure 5: Illustration of Theorem \ref{['thm:past_future']} and the identity \ref{['eq:past_future']}. The (density of the) probability that $X_{t_1}=x$ (past) knowing that $X_{t_2}=y$ (future) is the same as the probability that $\reflectbox{\vec{\reflectbox{X}}}_{s_1}=x$ (future) knowing that $\reflectbox{\vec{\reflectbox{X}}}_{s_2}=y$ (past).
  • ...and 5 more figures

Theorems & Definitions (13)

  • Proposition 2.1
  • Lemma 2.2
  • Theorem 1
  • Corollary 3.1
  • Corollary 3.2
  • Corollary 3.3
  • Lemma 4.1
  • Proposition 4.2
  • Theorem 2
  • Proposition 4.3
  • ...and 3 more