A Langevin sampling algorithm inspired by the Adam optimizer
Benedict Leimkuhler, René Lohmann, Peter Whalley
TL;DR
We address the challenge of efficiently sampling from the canonical density $\pi_{\beta}(x) \propto \exp(-\beta U(x))$ in high dimensions by introducing SamAdams, an adaptive-stepsize Langevin framework that augments the phase space with an auxiliary variable $\zeta$ and employs a Sundman time transformation $dt=\psi(\zeta)\,d\tau$. The dynamics are driven by an Adam-inspired monitor function $g$ that yields a moving-average control over the step size $\Delta t=\psi(\zeta)\Delta\tau$, while reweighting via $\psi(\zeta)$ recovers canonical expectations; the method is discretized with a splitting scheme (ZBAOABZ) and has provable weak convergence of order $O(\Delta\tau^2)$. Empirically, SamAdams improves stability and exploration speed across diverse problems, from asymmetric double wells and Neal's funnel to MNIST-based Bayesian neural networks, often allowing substantially larger mean stepsizes than fixed-step schemes without sacrificing accuracy. The framework is modular, compatible with existing fixed-step integrators, and poised to impact large-scale Bayesian inference and machine learning by reducing tuning effort and improving robustness in complex loss landscapes.
Abstract
We present a framework for adaptive-stepsize MCMC sampling based on time-rescaled Langevin dynamics, in which the stepsize variation is dynamically driven by an additional degree of freedom. Our approach augments the phase space by an additional variable which in turn defines a time reparameterization. The use of an auxiliary relaxation equation allows accumulation of a moving average of a local monitor function and provides for precise control of the timestep while circumventing the need to modify the drift term in the physical system. Our algorithm is straightforward to implement and can be readily combined with any off-the-peg fixed-stepsize Langevin integrator. As a particular example, we consider control of the stepsize by monitoring the norm of the log-posterior gradient, which takes inspiration from the Adam optimizer, the stepsize being automatically reduced in regions of steep change of the log posterior and increased on plateaus, improving numerical stability and convergence speed. As in Adam, the stepsize variation depends on the recent history of the gradient norm, which enhances stability and improves accuracy compared to more immediate control approaches. We demonstrate the potential benefit of this method--both in accuracy and in stability--in numerical experiments including Neal's funnel and a Bayesian neural network for classification of MNIST data.
