Backpropagation-Free Metropolis-Adjusted Langevin Algorithm
Adam D. Cobb, Susmit Jha
TL;DR
This work introduces backpropagation-free gradient-based MCMC by integrating forward-mode automatic differentiation (AD) tangent vectors into Metropolis-adjusted Langevin dynamics, defining four samplers: FMALA, Line-FMALA, PC-FMALA, and PC-Line-FMALA. It demonstrates how forward-mode AD can yield competitive or superior sampling performance while reducing memory and time costs relative to reverse-mode MALA, across a range of probabilistic models including hierarchical distributions, Bayesian neural networks, and CNNs. The authors analyze bias and variance from uniform-sphere tangent sampling, propose dimension- and Hessian-based corrections, and show that line-based and pre-conditioned variants often offer the best trade-offs in practice. Collectively, the results highlight a practical pathway for scalable Bayesian inference in high-dimensional settings where backpropagation is costly or memory-bound, with PC-Line-FMALA frequently delivering the strongest performance.
Abstract
Recent work on backpropagation-free learning has shown that it is possible to use forward-mode automatic differentiation (AD) to perform optimization on differentiable models. Forward-mode AD requires sampling a tangent vector for each forward pass of a model. The result is the model evaluation with the directional derivative along the tangent. In this paper, we illustrate how the sampling of this tangent vector can be incorporated into the proposal mechanism for the Metropolis-Adjusted Langevin Algorithm (MALA). As such, we are the first to introduce a backpropagation-free gradient-based Markov chain Monte Carlo (MCMC) algorithm. We also extend to a novel backpropagation-free position-specific preconditioned forward-mode MALA that leverages Hessian information. Overall, we propose four new algorithms: Forward MALA; Line Forward MALA; Pre-conditioned Forward MALA, and Pre-conditioned Line Forward MALA. We highlight the reduced computational cost of the forward-mode samplers and show that forward-mode is competitive with the original MALA, while even outperforming it depending on the probabilistic model. We include Bayesian inference results on a range of probabilistic models, including hierarchical distributions and Bayesian neural networks.
