Table of Contents
Fetching ...

Sampling with Adaptive Variance for Multimodal Distributions

Björn Engquist, Kui Ren, Yunan Yang

TL;DR

It is shown that a derivative-free version of the dynamics can be used for sampling without gradient information of the Gibbs potential and that for Gibbs distributions with nonconvex potentials, this approach could achieve significantly faster convergence than the classical overdamped Langevin dynamics.

Abstract

We propose and analyze a class of adaptive sampling algorithms for multimodal distributions on a bounded domain, which share a structural resemblance to the classic overdamped Langevin dynamics. We first demonstrate that this class of linear dynamics with adaptive diffusion coefficients and vector fields can be interpreted and analyzed as weighted Wasserstein gradient flows of the Kullback--Leibler (KL) divergence between the current distribution and the target Gibbs distribution, which directly leads to the exponential convergence of both the KL and $χ^2$ divergences, with rates depending on the weighted Wasserstein metric and the Gibbs potential. We then show that a derivative-free version of the dynamics can be used for sampling without gradient information of the Gibbs potential and that for Gibbs distributions with nonconvex potentials, this approach could achieve significantly faster convergence than the classical overdamped Langevin dynamics. A comparison of the mean transition times between local minima of a nonconvex potential further highlights the better efficiency of the derivative-free dynamics in sampling.

Sampling with Adaptive Variance for Multimodal Distributions

TL;DR

It is shown that a derivative-free version of the dynamics can be used for sampling without gradient information of the Gibbs potential and that for Gibbs distributions with nonconvex potentials, this approach could achieve significantly faster convergence than the classical overdamped Langevin dynamics.

Abstract

We propose and analyze a class of adaptive sampling algorithms for multimodal distributions on a bounded domain, which share a structural resemblance to the classic overdamped Langevin dynamics. We first demonstrate that this class of linear dynamics with adaptive diffusion coefficients and vector fields can be interpreted and analyzed as weighted Wasserstein gradient flows of the Kullback--Leibler (KL) divergence between the current distribution and the target Gibbs distribution, which directly leads to the exponential convergence of both the KL and divergences, with rates depending on the weighted Wasserstein metric and the Gibbs potential. We then show that a derivative-free version of the dynamics can be used for sampling without gradient information of the Gibbs potential and that for Gibbs distributions with nonconvex potentials, this approach could achieve significantly faster convergence than the classical overdamped Langevin dynamics. A comparison of the mean transition times between local minima of a nonconvex potential further highlights the better efficiency of the derivative-free dynamics in sampling.

Paper Structure

This paper contains 16 sections, 3 theorems, 65 equations, 6 figures.

Key Result

Theorem 3.1

Consider Equation eq:weighted W2 GF KL_1, the weighted Wasserstein gradient flow of energy $\mathcal{E}(\rho) = \text{KL}(\rho|\pi_G)$, i.e., Assume that the initial distribution has enough regularity such that the strong solution to the equation exists for $t\in [0, \infty)$. Then the dynamics converges exponentially fast to the unique steady-state distribution $\pi_G$ given in EQ:Gibbs: where

Figures (6)

  • Figure 1: Overdamped Langevin \ref{['eq:overdamped Langevin']} and the derivative-free \ref{['eq:diffuse_div_free']} dynamics are two weighted Wasserstein gradient flows of the energy $\mathcal{E}(\rho) = \text{KL}(\rho||\pi_G)$, yielding two curves in the space of probability distributions $\mathcal{P}(\mathbb T^d)$ starting from the same initial distribution $\rho_0$. Their convergence properties depend on different features of the Gibbs distribution $\pi_G$\ref{['EQ:Gibbs']}.
  • Figure 2: A double-well potential (a) and its corresponding Gibbs distribution (b).
  • Figure 3: (a) Double-well potentials considered in Section \ref{['subset:conv']} and (b) their corresponding multimodal Gibbs distribution.
  • Figure 4: Convergence behavior of the KL divergence $\text{KL}(\rho_t||\pi_G)$ and the $\chi^2$ divergence $\chi^2(\rho_t||\pi_G)$ for the overdamped Langevin dynamics \ref{['eq:overdamped Langevin']} and the derivative-free dynamics \ref{['eq:diffuse_div_free']}. As shown in Figure \ref{['fig:double-well-conv']}, we consider a class of double-well potentials. The dashed lines are the results of the Langevin dynamics, and the solid lines are for the derivative-free dynamics. The cases of $c=1$, $c=5$, and $c=9$ are plotted in black, red, and blue, respectively.
  • Figure 5: (a) The Gibbs potential $F$, (b) the target Gibbs distribution $\pi_G(\mathbf x) \varpropto \exp{(-20 F(\mathbf x))}$, (b) the probability distribution of the derivative-free dynamics at $T=10$, and (c) the probability distribution of the overdamped Langevin dynamics at $T=10$.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Theorem 3.1
  • proof
  • Theorem 3.2
  • proof
  • Definition A.1: $A_p$ weights
  • Theorem A.2: Weighted Poincaré inequality perez2019degenerate
  • Remark A.3