Table of Contents
Fetching ...

A Langevin sampling algorithm inspired by the Adam optimizer

Benedict Leimkuhler, René Lohmann, Peter Whalley

TL;DR

We address the challenge of efficiently sampling from the canonical density $\pi_{\beta}(x) \propto \exp(-\beta U(x))$ in high dimensions by introducing SamAdams, an adaptive-stepsize Langevin framework that augments the phase space with an auxiliary variable $\zeta$ and employs a Sundman time transformation $dt=\psi(\zeta)\,d\tau$. The dynamics are driven by an Adam-inspired monitor function $g$ that yields a moving-average control over the step size $\Delta t=\psi(\zeta)\Delta\tau$, while reweighting via $\psi(\zeta)$ recovers canonical expectations; the method is discretized with a splitting scheme (ZBAOABZ) and has provable weak convergence of order $O(\Delta\tau^2)$. Empirically, SamAdams improves stability and exploration speed across diverse problems, from asymmetric double wells and Neal's funnel to MNIST-based Bayesian neural networks, often allowing substantially larger mean stepsizes than fixed-step schemes without sacrificing accuracy. The framework is modular, compatible with existing fixed-step integrators, and poised to impact large-scale Bayesian inference and machine learning by reducing tuning effort and improving robustness in complex loss landscapes.

Abstract

We present a framework for adaptive-stepsize MCMC sampling based on time-rescaled Langevin dynamics, in which the stepsize variation is dynamically driven by an additional degree of freedom. Our approach augments the phase space by an additional variable which in turn defines a time reparameterization. The use of an auxiliary relaxation equation allows accumulation of a moving average of a local monitor function and provides for precise control of the timestep while circumventing the need to modify the drift term in the physical system. Our algorithm is straightforward to implement and can be readily combined with any off-the-peg fixed-stepsize Langevin integrator. As a particular example, we consider control of the stepsize by monitoring the norm of the log-posterior gradient, which takes inspiration from the Adam optimizer, the stepsize being automatically reduced in regions of steep change of the log posterior and increased on plateaus, improving numerical stability and convergence speed. As in Adam, the stepsize variation depends on the recent history of the gradient norm, which enhances stability and improves accuracy compared to more immediate control approaches. We demonstrate the potential benefit of this method--both in accuracy and in stability--in numerical experiments including Neal's funnel and a Bayesian neural network for classification of MNIST data.

A Langevin sampling algorithm inspired by the Adam optimizer

TL;DR

We address the challenge of efficiently sampling from the canonical density in high dimensions by introducing SamAdams, an adaptive-stepsize Langevin framework that augments the phase space with an auxiliary variable and employs a Sundman time transformation . The dynamics are driven by an Adam-inspired monitor function that yields a moving-average control over the step size , while reweighting via recovers canonical expectations; the method is discretized with a splitting scheme (ZBAOABZ) and has provable weak convergence of order . Empirically, SamAdams improves stability and exploration speed across diverse problems, from asymmetric double wells and Neal's funnel to MNIST-based Bayesian neural networks, often allowing substantially larger mean stepsizes than fixed-step schemes without sacrificing accuracy. The framework is modular, compatible with existing fixed-step integrators, and poised to impact large-scale Bayesian inference and machine learning by reducing tuning effort and improving robustness in complex loss landscapes.

Abstract

We present a framework for adaptive-stepsize MCMC sampling based on time-rescaled Langevin dynamics, in which the stepsize variation is dynamically driven by an additional degree of freedom. Our approach augments the phase space by an additional variable which in turn defines a time reparameterization. The use of an auxiliary relaxation equation allows accumulation of a moving average of a local monitor function and provides for precise control of the timestep while circumventing the need to modify the drift term in the physical system. Our algorithm is straightforward to implement and can be readily combined with any off-the-peg fixed-stepsize Langevin integrator. As a particular example, we consider control of the stepsize by monitoring the norm of the log-posterior gradient, which takes inspiration from the Adam optimizer, the stepsize being automatically reduced in regions of steep change of the log posterior and increased on plateaus, improving numerical stability and convergence speed. As in Adam, the stepsize variation depends on the recent history of the gradient norm, which enhances stability and improves accuracy compared to more immediate control approaches. We demonstrate the potential benefit of this method--both in accuracy and in stability--in numerical experiments including Neal's funnel and a Bayesian neural network for classification of MNIST data.

Paper Structure

This paper contains 27 sections, 1 theorem, 70 equations, 27 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Consider the system (eq: full_framework1)-(eq: full_framework2) and assume that $\psi$, $\sqrt{\psi}$, $\nabla U$, $g$ are $C^{6}$ functions with all partial derivatives bounded, further assume that $r < \psi < M$ for some $r, M > 0$. Then consider a splitting of the form ZBAOABZ and generating a s where $C >0$ is independent of $\Delta \tau >0$ and $(X(\cdot),P(\cdot),\zeta(\cdot))$ is the solut

Figures (27)

  • Figure 1: Sampling trajectories of a constant-stepsize integrator (BAOAB) and our adaptive-stepsize scheme (SamAdams) on a star-shaped landscape $U(x,y)= x^2 + 1000x^2y^2 + y^2$. Left: Potential $U(x,y)$ with trajectories. BAOAB was run at the mean stepsize used by SamAdams (obtained by averaging over all iterations). Right: Stepsize values $\Delta t$ used by SamAdams are binned by distance to the origin $r=\sqrt{x^2+y^2}$ together with the mean stepsize (blue dashed line) and the maximum stable stepsize for BAOAB (red dashed line). SamAdams uses a small stepsize only at the outer points of the stable domain.
  • Figure 2: SamAdams sampling procedure.
  • Figure 3: Sampling experiments on a 1D toy model (see text). a) Potential and Gibbs density for employed temperature $T=0.4$. b)$x$-coordinate and adaptive stepsize $\Delta t$ for SamAdams along a single trajectory. The black dashed line gives the value of virtual stepsize $\Delta \tau$ which is adaptively increased or reduced to yield the real stepsize $\Delta t$. c) Absolute mean errors of two observables, the $x$-coordinate and the occupation frequency of area $x<0.5$ against (mean) stepsize $\Delta t$. The different values for SamAdams were obtained by varying $\Delta \tau$ from 0.03 to 0.2. d)$\Delta t$ histograms for SamAdams run at three different $\Delta \tau$.
  • Figure 4: Left, Center: A BAOAB trajectory with stepsize $\Delta t=0.04$ shows an unstable evolution in $10^6$ steps. The descent into the funnel leads to a spike in both kinetic temperature and mean potential energy. Although shortlived, this type of event can, as here, corrupt long term averages. Right: At longer times, these instabilities are inevitable, for stepsizes above or equal to $\Delta t= 0.015$.
  • Figure 5: A SamAdams trajectory with mean stepsize $\langle \Delta t\rangle =0.16$ corrects the instability of the fixed stepsize method. The kinetic temperature and potential energy average converge to three significant digits of accuracy.
  • ...and 22 more figures

Theorems & Definitions (5)

  • Theorem 1
  • proof
  • Remark 1
  • Remark 2
  • Remark 3