Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Tara Akhound-Sadegh; Jarrid Rector-Brooks; Avishek Joey Bose; Sarthak Mittal; Pablo Lemos; Cheng-Hao Liu; Marcin Sendera; Siamak Ravanbakhsh; Gauthier Gidel; Yoshua Bengio; Nikolay Malkin; Alexander Tong

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Tara Akhound-Sadegh, Jarrid Rector-Brooks, Avishek Joey Bose, Sarthak Mittal, Pablo Lemos, Cheng-Hao Liu, Marcin Sendera, Siamak Ravanbakhsh, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Alexander Tong

TL;DR

This paper proposes Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler.

Abstract

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant $n$-body particle systems. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5\times$ faster, which allows it to be the first method to train using energy on the challenging $55$-particle Lennard-Jones system.

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

TL;DR

Abstract

-body particle systems. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains

faster, which allows it to be the first method to train using energy on the challenging

-particle Lennard-Jones system.

Paper Structure (37 sections, 42 equations, 13 figures, 6 tables, 1 algorithm)

This paper contains 37 sections, 42 equations, 13 figures, 6 tables, 1 algorithm.

Introduction
Background and preliminaries
Classical sampling methods
Denoising diffusion
Iterated Denoising Energy Matching
Denoising diffusion with a Boltzmann target (C1)
Amortized sampling with a diffusion sampler (C2)
Incorporating symmetries in iDEM
Experimental results
Main results
Ablation experiments
Related work
Conclusion
Proofs of propositions
iDEM for non-VE noising processes
...and 22 more sections

Figures (13)

Figure 1: iDEM fits a diffusion sampler to a target distribution given by an unnormalized density. In the outer loop, iDEM populates a buffer with samples from the current model $s_\theta$. In the inner loop, iDEM uses the DEM objective (\ref{['sec:denoising_boltzmann_target']}) to regress $s_\theta$ to an estimate of the score at noised samples from the buffer. The inner loop is simulation-free, i.e., requires no SDE integration.
Figure 2: Two ways of estimating the score $\nabla\log p_t(x_t)$. Left: A diffusion model estimates the score convolved with noise by stochastically regressing to the scores of distributions conditioned on $x_0$---i.e., points $\bullet$, $\bullet$, $\bullet$---weighted by the likelihood of $p(x_0 | x_t)$ (indicated by the arrow thickness). This regression requires samples from $\mu_{\text{target}}$. Right:DEM assumes an unnormalized density over $x_0$ and expresses the score of the convolved density as an expectation and regresses to a consistent estimator of this score.
Figure 3: Contour lines for the target distribution, which is a GMM with $40$ modes. Colored points represent samples from each method.
Figure 4: Comparison of the ground truth energy histograms of LJ-13 (left) and LJ-55 (right) and energies of samples generated from various methods. DDS is omitted from both plots while PIS is omitted from LJ-55 as they diverge in these settings.
Figure 5: Left: Log-log plot of bias and MSE vs. $K$ and a regression to the bias. Right: Plot of log bias vs. energy for different $K$. The MSE and bias are calculated for GMM with a linear noise schedule. The standard deviations for the log-transformed values are over $10$ seeds with the variance estimated over $256$ samples. For the plot on the right, the values are averaged over $x_0 \sim p_0$.
...and 8 more figures

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

TL;DR

Abstract

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Authors

TL;DR

Abstract

Table of Contents

Figures (13)