Table of Contents
Fetching ...

Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler

Siddharth Mitra, Andre Wibisono

TL;DR

The paper analyzes the mixing times of two Langevin-based samplers, the Unadjusted Langevin Algorithm (ULA) and the Proximal Sampler, in the general framework of $\Phi$-divergences. It develops a Strong Data Processing Inequality (SDPI) based approach combined with $\Phi$-Sobolev inequalities to obtain exponential decay bounds for $\mathsf{D}_{\Phi}$, valid whenever the stationary distribution satisfies a $\Phi$-SI. For ULA, the results show exponential convergence to the biased limit $\nu^{\eta}$ with rate $\left(1+\frac{2\alpha\eta}{(1+\eta L)^2}\right)^{-k}$ under $L$-smoothness and $\Phi$-SI of $\nu^{\eta}$; for the Proximal Sampler, the convergences hold to the target $\nu^X$ with rate $\left(1+\alpha\eta\right)^{-2k}$, assuming $\nu^X$ satisfies $\Phi$-SI. The work unifies and extends known KL, chi-square, and entropy results to the full class of twice-differentiable $\Phi$-divergences and provides tightness in the KL case via Gaussian examples, with practical implications for sampling accuracy and algorithm design.

Abstract

We study the mixing time of two popular discrete-time Markov chains in continuous space, the Unadjusted Langevin Algorithm and the Proximal Sampler, which are discretizations of the Langevin dynamics. We extend mixing time analyses for these Markov chains to hold in $Φ$-divergence. We show that any $Φ$-divergence arising from a twice-differentiable strictly convex function $Φ$ converges to $0$ exponentially fast along these Markov chains, under the assumption that their stationary distributions satisfy the corresponding $Φ$-Sobolev inequality, which holds for example when the target distribution of the Langevin dynamics is strongly log-concave. Our setting includes as special cases popular mixing time regimes, namely the mixing in chi-squared divergence under a Poincaré inequality, and the mixing in relative entropy under a log-Sobolev inequality. Our results follow by viewing the sampling algorithms as noisy channels and bounding the contraction coefficients arising in the appropriate strong data processing inequalities.

Fast Convergence of $Φ$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler

TL;DR

The paper analyzes the mixing times of two Langevin-based samplers, the Unadjusted Langevin Algorithm (ULA) and the Proximal Sampler, in the general framework of -divergences. It develops a Strong Data Processing Inequality (SDPI) based approach combined with -Sobolev inequalities to obtain exponential decay bounds for , valid whenever the stationary distribution satisfies a -SI. For ULA, the results show exponential convergence to the biased limit with rate under -smoothness and -SI of ; for the Proximal Sampler, the convergences hold to the target with rate , assuming satisfies -SI. The work unifies and extends known KL, chi-square, and entropy results to the full class of twice-differentiable -divergences and provides tightness in the KL case via Gaussian examples, with practical implications for sampling accuracy and algorithm design.

Abstract

We study the mixing time of two popular discrete-time Markov chains in continuous space, the Unadjusted Langevin Algorithm and the Proximal Sampler, which are discretizations of the Langevin dynamics. We extend mixing time analyses for these Markov chains to hold in -divergence. We show that any -divergence arising from a twice-differentiable strictly convex function converges to exponentially fast along these Markov chains, under the assumption that their stationary distributions satisfy the corresponding -Sobolev inequality, which holds for example when the target distribution of the Langevin dynamics is strongly log-concave. Our setting includes as special cases popular mixing time regimes, namely the mixing in chi-squared divergence under a Poincaré inequality, and the mixing in relative entropy under a log-Sobolev inequality. Our results follow by viewing the sampling algorithms as noisy channels and bounding the contraction coefficients arising in the appropriate strong data processing inequalities.

Paper Structure

This paper contains 31 sections, 15 theorems, 71 equations, 1 table.

Key Result

Theorem 1

Suppose the stationary distribution $\nu^{\eta}$ of ULA satisfies a $\Phi$-Sobolev inequality with optimal constant $\alpha>0$, and $\nu$ is $L$-smooth for some $0 < \alpha \leq L < \infty$. Let $X_k \sim \rho_k$ evolve following ULA eq:ULA_RandomVariableUpdate with step size $0 < \eta \leq 1/L$ fro

Theorems & Definitions (29)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Corollary 2
  • proof
  • ...and 19 more