Table of Contents
Fetching ...

Denoising Diffusion Samplers

Francisco Vargas, Will Grathwohl, Arnaud Doucet

TL;DR

We address sampling from unnormalized densities and estimating the normalizing constant Z by transforming the target through a forward diffusion to Gaussian and learning a time-reversed sampler (DDS) via a KL-based objective. The paper develops both continuous-time and discrete-time DDS, derives integrators that preserve ELBO, and connects to probability-flow ODEs and Schrödinger bridges. Theoretical guarantees show convergence under mild score-error assumptions, with bounds comparable to Langevin methods. Empirically, DDS achieve competitive performance against AIS/SMC and PIS on a range of challenging targets, with notable stability advantages and robust mode handling.

Abstract

Denoising diffusion models are a popular class of generative models providing state-of-the-art results in many domains. One adds gradually noise to data using a diffusion to transform the data distribution into a Gaussian distribution. Samples from the generative model are then obtained by simulating an approximation of the time-reversal of this diffusion initialized by Gaussian samples. Practically, the intractable score terms appearing in the time-reversed process are approximated using score matching techniques. We explore here a similar idea to sample approximately from unnormalized probability density functions and estimate their normalizing constants. We consider a process where the target density diffuses towards a Gaussian. Denoising Diffusion Samplers (DDS) are obtained by approximating the corresponding time-reversal. While score matching is not applicable in this context, we can leverage many of the ideas introduced in generative modeling for Monte Carlo sampling. Existing theoretical results from denoising diffusion models also provide theoretical guarantees for DDS. We discuss the connections between DDS, optimal control and Schrödinger bridges and finally demonstrate DDS experimentally on a variety of challenging sampling tasks.

Denoising Diffusion Samplers

TL;DR

We address sampling from unnormalized densities and estimating the normalizing constant Z by transforming the target through a forward diffusion to Gaussian and learning a time-reversed sampler (DDS) via a KL-based objective. The paper develops both continuous-time and discrete-time DDS, derives integrators that preserve ELBO, and connects to probability-flow ODEs and Schrödinger bridges. Theoretical guarantees show convergence under mild score-error assumptions, with bounds comparable to Langevin methods. Empirically, DDS achieve competitive performance against AIS/SMC and PIS on a range of challenging targets, with notable stability advantages and robust mode handling.

Abstract

Denoising diffusion models are a popular class of generative models providing state-of-the-art results in many domains. One adds gradually noise to data using a diffusion to transform the data distribution into a Gaussian distribution. Samples from the generative model are then obtained by simulating an approximation of the time-reversal of this diffusion initialized by Gaussian samples. Practically, the intractable score terms appearing in the time-reversed process are approximated using score matching techniques. We explore here a similar idea to sample approximately from unnormalized probability density functions and estimate their normalizing constants. We consider a process where the target density diffuses towards a Gaussian. Denoising Diffusion Samplers (DDS) are obtained by approximating the corresponding time-reversal. While score matching is not applicable in this context, we can leverage many of the ideas introduced in generative modeling for Monte Carlo sampling. Existing theoretical results from denoising diffusion models also provide theoretical guarantees for DDS. We discuss the connections between DDS, optimal control and Schrödinger bridges and finally demonstrate DDS experimentally on a variety of challenging sampling tasks.
Paper Structure (53 sections, 9 theorems, 99 equations, 12 figures, 7 tables)

This paper contains 53 sections, 9 theorems, 99 equations, 12 figures, 7 tables.

Key Result

Proposition 1

The Radon--Nikodym derivative $\frac{\mathrm{d}\mathcal{Q}^{\theta}}{\mathrm{d}\mathcal{P}^{\textup{ref}}}(y_{[0,T]})$ satisfies under $\mathcal{Q}^{\theta}$ From the identity $\mathrm{KL}(\mathcal{Q}^\theta||\mathcal{P})=\mathrm{KL}(\mathcal{Q}^\theta||\mathcal{P}^{\textup{ref}})+\mathbb{E}_{y_T \sim q^{\theta}_0}[\ln \left(\frac{p^{\textup{ref}}_0(y_T)}{p_0(y_T)}\right)]$, it follows that

Figures (12)

  • Figure 1: Training loss per hyperparameter: PIS (left) vs DDS (right).
  • Figure 2: $\mathop{\mathrm{ln}}\nolimits Z$ estimate (median plus upper/lower quartiles) as a function of number of steps $K$ - a) Funnel , b) LGCP, c) Logistic Ionosphere dataset. Yellow dotted line is MF-VI and dashed magenta is the gold standard.
  • Figure 3: $\mathop{\mathrm{ln}}\nolimits Z$ estimate as a function of number of steps $K$ - a) Logistic Sonar dataset, b) Brownian motion, c) NICE. Yellow dotted line is MF-VI and dashed magenta is the gold standard.
  • Figure 4: Magnitude of the learnt neural net approximation of the drift $\nabla_x\ln \phi_{t}(x)$ (see (\ref{['eq:drifteq']})) as a function of $t$.
  • Figure 5: Results on pretrained VAE from arbel2021annealed.
  • ...and 7 more figures

Theorems & Definitions (12)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Corollary 1
  • Corollary 2
  • proof
  • Proposition 4
  • proof
  • Proposition 5
  • Proposition 6
  • ...and 2 more