Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics

Yulong Lu; Dejan Slepčev; Lihan Wang

Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics

Yulong Lu, Dejan Slepčev, Lihan Wang

TL;DR

It is proved that on the torus, smooth and bounded positive solutions of the kernelized dynamics converge on finite time intervals, to the pure birth-death dynamics as the kernel bandwidth shrinks to zero.

Abstract

Motivated by the challenge of sampling Gibbs measures with nonconvex potentials, we study a continuum birth-death dynamics. We improve results in previous works [51,57] and provide weaker hypotheses under which the probability density of the birth-death governed by Kullback-Leibler divergence or by $χ^2$ divergence converge exponentially fast to the Gibbs equilibrium measure, with a universal rate that is independent of the potential barrier. To build a practical numerical sampler based on the pure birth-death dynamics, we consider an interacting particle system, which is inspired by the gradient flow structure and the classical Fokker-Planck equation and relies on kernel-based approximations of the measure. Using the technique of $Γ$-convergence of gradient flows, we show that on the torus, smooth and bounded positive solutions of the kernelized dynamics converge on finite time intervals, to the pure birth-death dynamics as the kernel bandwidth shrinks to zero. Moreover we provide quantitative estimates on the bias of minimizers of the energy corresponding to the kernelized dynamics. Finally we prove the long-time asymptotic results on the convergence of the asymptotic states of the kernelized dynamics towards the Gibbs measure.

Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics

TL;DR

Abstract

divergence converge exponentially fast to the Gibbs equilibrium measure, with a universal rate that is independent of the potential barrier. To build a practical numerical sampler based on the pure birth-death dynamics, we consider an interacting particle system, which is inspired by the gradient flow structure and the classical Fokker-Planck equation and relies on kernel-based approximations of the measure. Using the technique of

-convergence of gradient flows, we show that on the torus, smooth and bounded positive solutions of the kernelized dynamics converge on finite time intervals, to the pure birth-death dynamics as the kernel bandwidth shrinks to zero. Moreover we provide quantitative estimates on the bias of minimizers of the energy corresponding to the kernelized dynamics. Finally we prove the long-time asymptotic results on the convergence of the asymptotic states of the kernelized dynamics towards the Gibbs measure.

Paper Structure (14 sections, 21 theorems, 161 equations, 4 figures, 1 table)

This paper contains 14 sections, 21 theorems, 161 equations, 4 figures, 1 table.

Introduction
Birth-death dynamics, and their long-time convergence
Approximations to \ref{['eqn:purebd']} that allow for discrete measures
Contributions
Related works
Pure birth-death dynamics governed by relative entropy
Pure birth-death dynamics governed by chi-squared divergence
Kernelized dynamics and its particle approximations
Quantitative distance between minimizers
Well-posedness of gradient flows \ref{['eqn:gfkerbd']}
$\Gamma$-convergence of gradient flows
Convergence of asymptotic sets
Particle based schemes
Numerical Examples

Key Result

Lemma 2.1

Suppose $\{\rho_n\}_{n=1}^\infty$ and $\rho$ are measures on $\mathbb{R}^d$ and are all absolutely continuous with respect to some measure $\lambda$. Suppose also that $\rho$ has finite total mass. Then As a consequence of eqn:dHdSHequiv, if we further assume $\rho_n,\rho$ are probability measures on $\mathbb{R}^d$, then

Figures (4)

Figure 1: 1D torus example. Left: evolution of $\mathop{\mathrm{KL}}\limits(\rho^{(\varepsilon)}_t | \pi)$ for various $\varepsilon$, which heuristically goes to some fixed number as $t\to\infty$ for every fixed $\varepsilon$. Right: the relationship between $\varepsilon$ and $\mathop{\mathrm{KL}}\limits(\rho^{(\varepsilon)}_\infty|\pi)$, which scales like $O(\varepsilon^2)$ as $\varepsilon\to 0$.
Figure 2: Gaussian mixture example. Left: error of observable $f(x,y) = x^2/3+y^2/5$; center: MMD with kernel $K(x,y) = (2\pi)^{-\frac{d}{2}}e^{-\frac{|x-y|^2}{2}}$; right: observable error and MMD for Langevin dynamics (ULA) and SVGD up to $T=100$. Both left and center plots are averaged over 30 experiments. Both birth-death algorithms based on KL and $\chi^2$ converge much faster to equilibrium as $t$ gets larger.
Figure 3: Gaussian mixture example. Top: position of particles at $T=3$; bottom: position of particles at $T=10$. Algorithms based on birth-death are better at attracting particles into under-explored regions.
Figure 4: Bayesian classification problem with dataset "Image". The birth-death Langevin algorithm reaches the desired accuracy and log-likelihood much faster than SVGD.

Theorems & Definitions (49)

Lemma 2.1
proof
Lemma 2.2
proof
Theorem 2.3
Theorem 2.4
proof
Remark 2.5
Remark 2.6
Example 2.7
...and 39 more

Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics

TL;DR

Abstract

Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (49)