Table of Contents
Fetching ...

Unbiased Kinetic Langevin Monte Carlo with Inexact Gradients

Neil K. Chada, Benedict Leimkuhler, Daniel Paulin, Peter A. Whalley

TL;DR

This work introduces UBUBU, an unbiased multilevel Monte Carlo estimator for Bayesian posteriors based on kinetic Langevin dynamics with an advanced splitting scheme (UBU). By coupling discretization levels and leveraging exact, stochastic, and approximate gradients, the method eliminates discretization bias without Metropolis corrections and achieves finite variance with a central limit theorem. Theoretical results show the estimator's gradient-evaluation cost scales as O(d^{1/4} ε^{-2}) under strong Hessian-Lipschitz conditions, and remains efficient for large datasets, including product distributions where variance is dimension-free. Empirically, UBUBU outperforms randomized HMC in high-dimensional problems (MNIST multinomial regression and Poisson regression), often by 2–3 orders of magnitude in gradient evaluations per effective sample, demonstrating substantial practical impact for large-scale Bayesian inference.

Abstract

We present an unbiased method for Bayesian posterior means based on kinetic Langevin dynamics that combines advanced splitting methods with enhanced gradient approximations. Our approach avoids Metropolis correction by coupling Markov chains at different discretization levels in a multilevel Monte Carlo approach. Theoretical analysis demonstrates that our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem. It can achieve accuracy $ε>0$ for estimating expectations of Lipschitz functions in $d$ dimensions with $\mathcal{O}(d^{1/4}ε^{-2})$ expected gradient evaluations, without assuming warm start. We exhibit similar bounds using both approximate and stochastic gradients, and our method's computational cost is shown to scale independently of the size of the dataset. The proposed method is tested using a multinomial regression problem on the MNIST dataset and a Poisson regression model for soccer scores. Experiments indicate that the number of gradient evaluations per effective sample is independent of dimension, even when using inexact gradients. For product distributions, we give dimension-independent variance bounds. Our results demonstrate that in large-scale applications, the unbiased algorithm we present can be 2-3 orders of magnitude more efficient than the ``gold-standard" randomized Hamiltonian Monte Carlo.

Unbiased Kinetic Langevin Monte Carlo with Inexact Gradients

TL;DR

This work introduces UBUBU, an unbiased multilevel Monte Carlo estimator for Bayesian posteriors based on kinetic Langevin dynamics with an advanced splitting scheme (UBU). By coupling discretization levels and leveraging exact, stochastic, and approximate gradients, the method eliminates discretization bias without Metropolis corrections and achieves finite variance with a central limit theorem. Theoretical results show the estimator's gradient-evaluation cost scales as O(d^{1/4} ε^{-2}) under strong Hessian-Lipschitz conditions, and remains efficient for large datasets, including product distributions where variance is dimension-free. Empirically, UBUBU outperforms randomized HMC in high-dimensional problems (MNIST multinomial regression and Poisson regression), often by 2–3 orders of magnitude in gradient evaluations per effective sample, demonstrating substantial practical impact for large-scale Bayesian inference.

Abstract

We present an unbiased method for Bayesian posterior means based on kinetic Langevin dynamics that combines advanced splitting methods with enhanced gradient approximations. Our approach avoids Metropolis correction by coupling Markov chains at different discretization levels in a multilevel Monte Carlo approach. Theoretical analysis demonstrates that our proposed estimator is unbiased, attains finite variance, and satisfies a central limit theorem. It can achieve accuracy for estimating expectations of Lipschitz functions in dimensions with expected gradient evaluations, without assuming warm start. We exhibit similar bounds using both approximate and stochastic gradients, and our method's computational cost is shown to scale independently of the size of the dataset. The proposed method is tested using a multinomial regression problem on the MNIST dataset and a Poisson regression model for soccer scores. Experiments indicate that the number of gradient evaluations per effective sample is independent of dimension, even when using inexact gradients. For product distributions, we give dimension-independent variance bounds. Our results demonstrate that in large-scale applications, the unbiased algorithm we present can be 2-3 orders of magnitude more efficient than the ``gold-standard" randomized Hamiltonian Monte Carlo.
Paper Structure (42 sections, 53 theorems, 417 equations, 11 figures, 4 tables, 6 algorithms)

This paper contains 42 sections, 53 theorems, 417 equations, 11 figures, 4 tables, 6 algorithms.

Key Result

Proposition 3.5

Suppose that Assumptions ass:var, ass:numbsamp, ass:comp and ass:independence hold, and that $2<\phi_N<\phi_D$. Then $S$ as defined in eq:Sdef is an unbiased estimator of $\mu(f)$ that has finite variance and finite expected computational cost. Similarly, for any $0\le c_R< \frac{1}{\phi_N^{1/2}}$, $S(c_R)$ as defined in eq:SRichardsondef is also an unbiased estimator of $\mu(f)$ with finite vari

Figures (11)

  • Figure 1: Coupled sample paths based on synchronous coupling from $\text{UBU}$ (Section \ref{['sec:back']}) discretization scheme of kinetic Langevin diffusion for a Gaussian target at stepsizes $h=1.5,0.75$ and $h=0.75, 0.375$. $\text{UBU}$ is strong order 2, so the typical distance between coupled paths is $\mathcal{O}(h^2)$.
  • Figure 2: Elimination of bias by increasing burn-in lengths at higher discretization levels.
  • Figure 3: Coupling scheme for UBUBU-SG.
  • Figure 4: Dimension dependence of gradients/ESS for test function $\|x\|$ for Gaussian targets.
  • Figure 5: Dimensional dependence of gradients/ESS over all components for Gaussian targets. Error bars represent bootstrap confidence intervals.
  • ...and 6 more figures

Theorems & Definitions (132)

  • Definition 2.1
  • Remark 3.5
  • Proposition 3.5
  • proof
  • Theorem 3.6
  • proof
  • Definition 3.7
  • Remark 3.8
  • Remark 3.9
  • Remark 3.10
  • ...and 122 more