An optimal control perspective on diffusion-based generative modeling

Julius Berner; Lorenz Richter; Karen Ullrich

An optimal control perspective on diffusion-based generative modeling

Julius Berner, Lorenz Richter, Karen Ullrich

TL;DR

The paper connects stochastic optimal control with diffusion-based generative models by deriving an HJB equation for the time evolution of log-densities and showing that the diffusion-model ELBO arises from a control-theoretic verification theorem. It reframes diffusion modeling as a path-space KL minimization and introduces time-reversed diffusion sampling (DIS) to sample from unnormalized densities, bridging diffusion models with sampling in statistics and physics. Through numerical experiments on Gaussian mixtures, Funnel, and double-well densities, DIS demonstrates competitive or superior performance to existing diffusion samplers, while analyses against Schrödinger bridges clarify the methodological distinctions. The work suggests promising future directions, including PDE-based solvers, alternative divergences on path space, and extensions to Schrödinger bridges for more flexible and robust sampling tasks.

Abstract

We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs), such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton-Jacobi-Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control theory to generative modeling. First, we show that the evidence lower bound is a direct consequence of the well-known verification theorem from control theory. Further, we can formulate diffusion-based generative modeling as a minimization of the Kullback-Leibler divergence between suitable measures in path space. Finally, we develop a novel diffusion-based method for sampling from unnormalized densities -- a problem frequently occurring in statistics and computational sciences. We demonstrate that our time-reversed diffusion sampler (DIS) can outperform other diffusion-based sampling approaches on multiple numerical examples.

An optimal control perspective on diffusion-based generative modeling

TL;DR

Abstract

Paper Structure (48 sections, 16 theorems, 145 equations, 15 figures, 2 tables, 1 algorithm)

This paper contains 48 sections, 16 theorems, 145 equations, 15 figures, 2 tables, 1 algorithm.

Introduction
Related work
Diffusion models:
Sampling from (unnormalized) densities:
Schrödinger bridges:
Notation
SDE-based generative modeling as an optimal control problem
PDE perspective: HJB equation for log-density
Optimal control perspective: ELBO derivation
Path space perspective: KL divergence in continuous time
Connection to denoising score matching objective
Sampling from unnormalized densities
Comparison to Schrödinger half-bridges
Numerical examples
Examples
...and 33 more sections

Key Result

Lemma 2.3

Let us define $V \coloneqq -\log { \hbox{{\cr \hidewidth\reflectbox{$\m@th\vec{}\mkern4mu$}\hidewidth\cr {} $\m@th p$\cr }}}_{X}$. Then $V$ is a solution to the HJB equation with terminal condition $V(\cdot,T) = -\log p_{X_0}$.

Figures (15)

Figure 1: The connection between diffusion models and optimal control is outlined in \ref{['sec: different perspectives']}. Three possible implications of this connection are the derivation of the ELBO via the verification theorem (\ref{['sec:control']}), a path space measure interpretation of diffusion models (\ref{['sec:path']}), and a novel approach for sampling from (unnormalized) densities (\ref{['sec: sampling from unnormalized densities']}).
Figure 2: For a stochastic process $Y$ and its time-reversed process $X$, we show the density $p_Y(\cdot,t)=p_X(\cdot,T-t)$ (top), the drift $f(\cdot,t)$ of $Y$ (middle), and the drift $\mu(\cdot,t)$ of $X$ (bottom) for $t\in \left[0,\frac{T}{3},\tfrac{2T}{3},T \right]$, see \ref{['eq:rev_def']} and \ref{['eq: inference SDE']}.
Figure 3: Illustration of our DIS algorithm for the double well example in \ref{['sec: numerical examples']} with $d=20$, $w=5$, $\delta=3$. The process $X^u$ starts from a Gaussian (approximately distributed as $Y_T$) and the control $u$ is trained such that the distribution at terminal time $X_T^u$ approximates the target density $\rho/\mathcal{Z}$. The plot displays some trajectories as well as histograms at initial and terminal times. In the right panel, we show a KDE density estimation of a $2d$ marginal of the corresponding double well.
Figure 4: We compare our DIS method against PIS on the ability to compute the log-normalizing constant $\log \mathcal{Z}$ (median and interquartile range over $10$ training seeds) when using $N\in\{100,200,400,800\}$ steps of the Euler-Maruyama scheme. Our method outperforms PIS clearly for the GMM and Funnel examples and offers a slight improvement for the DW example. Each model has been trained with each $N$ for $1/4$ of the total gradient steps (starting with $100$ and ending with $800$ steps). See also Figure \ref{['fig:logz_comp']} for a comparison of models trained on a single step size.
Figure 5: We compare our DIS method against PIS on the ability to estimate the expectations $\mathbbm{E}\left[\|Y_0\|^2\right]$ and $\mathbbm{E}\left[\|Y_0\|_1\right]$ and the average standard deviation $\frac{1}{d}\sum_{i=1}^d\sqrt{\mathbbm{V}\left[(Y_0)_i\right]}$, see \ref{['eq: expectation of interest']}. Each model has been trained with each $N\in\{100,200,400,800\}$ for $1/4$ of the total gradient steps (starting with $100$ and ending with $800$ steps). We compare the methods using $N$ steps of the Euler-Maruyama scheme when computing our estimates (median and interquartile range over $10$ training seeds). Our method outperforms PIS in all considered settings in terms of accuracy.
...and 10 more figures

Theorems & Definitions (26)

Remark 2.2: Reverse-time SDE
Lemma 2.3: HJB equation for log-density
Theorem 2.4: Verification theorem
Corollary 2.5: Evidence lower bound
Proposition 2.6: Optimal path space measure
Corollary 3.1: Reverse KL divergence
Theorem A.1: Reverse-time SDE
proof
Lemma A.2: Hopf--Cole transformation
Theorem A.3: Verification theorem for general HJB equation
...and 16 more

An optimal control perspective on diffusion-based generative modeling

TL;DR

Abstract

An optimal control perspective on diffusion-based generative modeling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (26)