Robust Approximate Sampling via Stochastic Gradient Barker Dynamics
Lorenzo Mauri, Giacomo Zanella
TL;DR
This work extends Barker's robust MCMC proposal to stochastic-gradient settings, producing the stochastic-gradient Barker dynamics (SGBD). It analyzes bias caused by minibatch gradient noise and proposes a corrected estimator (c-SGBD) based on a normal-noise assumption to mitigate bias, plus an extreme variant (e-SGBD) for high-noise regimes. Empirical results across skewed, ill-conditioned, and high-dimensional Bayesian problems show SGBD to be more robust to hyperparameter choices and gradient heterogeneity than SGLD, with c-SGBD often improving accuracy and e-SGBD offering fast convergence. The approach provides a practical, robust alternative to SGLD for large-scale Bayesian inference with complex posteriors.
Abstract
Stochastic Gradient (SG) Markov Chain Monte Carlo algorithms (MCMC) are popular algorithms for Bayesian sampling in the presence of large datasets. However, they come with little theoretical guarantees and assessing their empirical performances is non-trivial. In such context, it is crucial to develop algorithms that are robust to the choice of hyperparameters and to gradients heterogeneity since, in practice, both the choice of step-size and behaviour of target gradients induce hard-to-control biases in the invariant distribution. In this work we introduce the stochastic gradient Barker dynamics (SGBD) algorithm, extending the recently developed Barker MCMC scheme, a robust alternative to Langevin-based sampling algorithms, to the stochastic gradient framework. We characterize the impact of stochastic gradients on the Barker transition mechanism and develop a bias-corrected version that, under suitable assumptions, eliminates the error due to the gradient noise in the proposal. We illustrate the performance on a number of high-dimensional examples, showing that SGBD is more robust to hyperparameter tuning and to irregular behavior of the target gradients compared to the popular stochastic gradient Langevin dynamics algorithm.
