An Inertial Langevin Algorithm
Alexander Falk, Andreas Habring, Christoph Griesbacher, Thomas Pock
TL;DR
This work introduces the Inertial Langevin Algorithm (ILA), a momentum-augmented discretization of Langevin dynamics designed to accelerate sampling from Gibbs distributions $\pi(x) \propto \exp(-U(x))$. By identifying ILA as a discretization of kinetic Langevin dynamics, the authors establish geometric ergodicity in continuous and discrete time and derive a $\mathcal{W}_2$-bias bound that scales as $\mathcal{O}(\sqrt{\Delta t})$, while enabling smaller friction parameters for faster mixing. The paper also elucidates a close link between ILA and over-relaxed Gibbs sampling, and demonstrates substantial empirical acceleration across toy, denoising, and molecular-structure-generation tasks, including high-dimensional and non-smooth settings. The combination of theoretical guarantees and broad numerical validation indicates that momentum-based sampling can significantly improve mixing and practical performance beyond traditional strongly convex regimes. Overall, ILA provides a principled, faster alternative to standard Langevin-based samplers with concrete guarantees and versatile applicability across inverse problems and machine learning tasks.
Abstract
We present a novel method for drawing samples from Gibbs distributions with densities of the form $π(x) \propto \exp(-U(x))$. The method accelerates the unadjusted Langevin algorithm by introducing an inertia term similar to Polyak's heavy ball method, together with a corresponding noise rescaling. Interpreting the scheme as a discretization of \emph{kinetic} Langevin dynamics, we prove ergodicity (in continuous and discrete time) for twice continuously differentiable, strongly convex, and $L$-smooth potentials and bound the bias of the discretization to the target in Wasserstein-2 distance. In particular, the presented proofs allow for smaller friction parameters in the kinetic Langevin diffusion compared to existing literature. Moreover, we show the close ties of the proposed method to the over-relaxed Gibbs sampler. The scheme is tested in an extensive set of numerical experiments covering simple toy examples, total variation image denoising, and the complex task of maximum likelihood learning of an energy-based model for molecular structure generation. The experimental results confirm the acceleration provided by the proposed scheme even beyond the strongly convex and $L$-smooth setting.
