Table of Contents
Fetching ...

Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics

Yulong Lu, Dejan Slepčev, Lihan Wang

TL;DR

It is proved that on the torus, smooth and bounded positive solutions of the kernelized dynamics converge on finite time intervals, to the pure birth-death dynamics as the kernel bandwidth shrinks to zero.

Abstract

Motivated by the challenge of sampling Gibbs measures with nonconvex potentials, we study a continuum birth-death dynamics. We improve results in previous works [51,57] and provide weaker hypotheses under which the probability density of the birth-death governed by Kullback-Leibler divergence or by $χ^2$ divergence converge exponentially fast to the Gibbs equilibrium measure, with a universal rate that is independent of the potential barrier. To build a practical numerical sampler based on the pure birth-death dynamics, we consider an interacting particle system, which is inspired by the gradient flow structure and the classical Fokker-Planck equation and relies on kernel-based approximations of the measure. Using the technique of $Γ$-convergence of gradient flows, we show that on the torus, smooth and bounded positive solutions of the kernelized dynamics converge on finite time intervals, to the pure birth-death dynamics as the kernel bandwidth shrinks to zero. Moreover we provide quantitative estimates on the bias of minimizers of the energy corresponding to the kernelized dynamics. Finally we prove the long-time asymptotic results on the convergence of the asymptotic states of the kernelized dynamics towards the Gibbs measure.

Birth-death dynamics for sampling: Global convergence, approximations and their asymptotics

TL;DR

It is proved that on the torus, smooth and bounded positive solutions of the kernelized dynamics converge on finite time intervals, to the pure birth-death dynamics as the kernel bandwidth shrinks to zero.

Abstract

Motivated by the challenge of sampling Gibbs measures with nonconvex potentials, we study a continuum birth-death dynamics. We improve results in previous works [51,57] and provide weaker hypotheses under which the probability density of the birth-death governed by Kullback-Leibler divergence or by divergence converge exponentially fast to the Gibbs equilibrium measure, with a universal rate that is independent of the potential barrier. To build a practical numerical sampler based on the pure birth-death dynamics, we consider an interacting particle system, which is inspired by the gradient flow structure and the classical Fokker-Planck equation and relies on kernel-based approximations of the measure. Using the technique of -convergence of gradient flows, we show that on the torus, smooth and bounded positive solutions of the kernelized dynamics converge on finite time intervals, to the pure birth-death dynamics as the kernel bandwidth shrinks to zero. Moreover we provide quantitative estimates on the bias of minimizers of the energy corresponding to the kernelized dynamics. Finally we prove the long-time asymptotic results on the convergence of the asymptotic states of the kernelized dynamics towards the Gibbs measure.
Paper Structure (14 sections, 21 theorems, 161 equations, 4 figures, 1 table)

This paper contains 14 sections, 21 theorems, 161 equations, 4 figures, 1 table.

Key Result

Lemma 2.1

Suppose $\{\rho_n\}_{n=1}^\infty$ and $\rho$ are measures on $\mathbb{R}^d$ and are all absolutely continuous with respect to some measure $\lambda$. Suppose also that $\rho$ has finite total mass. Then As a consequence of eqn:dHdSHequiv, if we further assume $\rho_n,\rho$ are probability measures on $\mathbb{R}^d$, then

Figures (4)

  • Figure 1: 1D torus example. Left: evolution of $\mathop{\mathrm{KL}}\limits(\rho^{(\varepsilon)}_t | \pi)$ for various $\varepsilon$, which heuristically goes to some fixed number as $t\to\infty$ for every fixed $\varepsilon$. Right: the relationship between $\varepsilon$ and $\mathop{\mathrm{KL}}\limits(\rho^{(\varepsilon)}_\infty|\pi)$, which scales like $O(\varepsilon^2)$ as $\varepsilon\to 0$.
  • Figure 2: Gaussian mixture example. Left: error of observable $f(x,y) = x^2/3+y^2/5$; center: MMD with kernel $K(x,y) = (2\pi)^{-\frac{d}{2}}e^{-\frac{|x-y|^2}{2}}$; right: observable error and MMD for Langevin dynamics (ULA) and SVGD up to $T=100$. Both left and center plots are averaged over 30 experiments. Both birth-death algorithms based on KL and $\chi^2$ converge much faster to equilibrium as $t$ gets larger.
  • Figure 3: Gaussian mixture example. Top: position of particles at $T=3$; bottom: position of particles at $T=10$. Algorithms based on birth-death are better at attracting particles into under-explored regions.
  • Figure 4: Bayesian classification problem with dataset "Image". The birth-death Langevin algorithm reaches the desired accuracy and log-likelihood much faster than SVGD.

Theorems & Definitions (49)

  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • Theorem 2.3
  • Theorem 2.4
  • proof
  • Remark 2.5
  • Remark 2.6
  • Example 2.7
  • ...and 39 more