Table of Contents
Fetching ...

Adaptive Stepsizing for Stochastic Gradient Langevin Dynamics in Bayesian Neural Networks

Rajit Rajpal, Benedict Leimkuhler, Yuanhao Jiang

TL;DR

This work addresses the sensitivity of stochastic gradient MCMC methods to stepsize by introducing SA-SGLD, an adaptive stepsize scheme based on SamAdams time rescaling. By modulating the step via a gradient-norm monitor and a Sundman-type transform, SA-SGLD preserves the correct invariant distribution while avoiding costly divergence corrections, improving stability and mixing in Bayesian neural network sampling. Theoretical results establish uniform moment bounds and ergodicity with an $O(h)$ bias for the weighted time-average, and empirical results demonstrate enhanced posterior exploration on high-curvature landscapes and improved predictive calibration for BNNs with sharp priors. The approach offers a practical, scalable alternative for Bayesian deep learning that delivers more accurate posterior samples without significant computational overhead, enabling robust uncertainty estimation in complex models.

Abstract

Bayesian neural networks (BNNs) require scalable sampling algorithms to approximate posterior distributions over parameters. Existing stochastic gradient Markov Chain Monte Carlo (SGMCMC) methods are highly sensitive to the choice of stepsize and adaptive variants such as pSGLD typically fail to sample the correct invariant measure without addition of a costly divergence correction term. In this work, we build on the recently proposed `SamAdams' framework for timestep adaptation (Leimkuhler, Lohmann, and Whalley 2025), introducing an adaptive scheme: SA-SGLD, which employs time rescaling to modulate the stepsize according to a monitored quantity (typically the local gradient norm). SA-SGLD can automatically shrink stepsizes in regions of high curvature and expand them in flatter regions, improving both stability and mixing without introducing bias. We show that our method can achieve more accurate posterior sampling than SGLD on high-curvature 2D toy examples and in image classification with BNNs using sharp priors.

Adaptive Stepsizing for Stochastic Gradient Langevin Dynamics in Bayesian Neural Networks

TL;DR

This work addresses the sensitivity of stochastic gradient MCMC methods to stepsize by introducing SA-SGLD, an adaptive stepsize scheme based on SamAdams time rescaling. By modulating the step via a gradient-norm monitor and a Sundman-type transform, SA-SGLD preserves the correct invariant distribution while avoiding costly divergence corrections, improving stability and mixing in Bayesian neural network sampling. Theoretical results establish uniform moment bounds and ergodicity with an bias for the weighted time-average, and empirical results demonstrate enhanced posterior exploration on high-curvature landscapes and improved predictive calibration for BNNs with sharp priors. The approach offers a practical, scalable alternative for Bayesian deep learning that delivers more accurate posterior samples without significant computational overhead, enabling robust uncertainty estimation in complex models.

Abstract

Bayesian neural networks (BNNs) require scalable sampling algorithms to approximate posterior distributions over parameters. Existing stochastic gradient Markov Chain Monte Carlo (SGMCMC) methods are highly sensitive to the choice of stepsize and adaptive variants such as pSGLD typically fail to sample the correct invariant measure without addition of a costly divergence correction term. In this work, we build on the recently proposed `SamAdams' framework for timestep adaptation (Leimkuhler, Lohmann, and Whalley 2025), introducing an adaptive scheme: SA-SGLD, which employs time rescaling to modulate the stepsize according to a monitored quantity (typically the local gradient norm). SA-SGLD can automatically shrink stepsizes in regions of high curvature and expand them in flatter regions, improving both stability and mixing without introducing bias. We show that our method can achieve more accurate posterior sampling than SGLD on high-curvature 2D toy examples and in image classification with BNNs using sharp priors.

Paper Structure

This paper contains 19 sections, 4 theorems, 51 equations, 4 figures, 1 table.

Key Result

Lemma 1

Assume the following: Define the constants: If $h>0$ is small enough that then the iterates of satisfy the uniform moment bound

Figures (4)

  • Figure 1: Müller--Brown potential. SA-SGLD adapts its step size to local curvature, enabling transitions across energy barriers—analogous to escaping local modes in complex BNN posteriors.
  • Figure 2: Star potential. SA-SGLD dynamically scales its step size, entering narrow high-curvature funnels that SGLD fails to explore.
  • Figure 3: SGLD vs SA-SGLD on sampling BNN with Horseshoe prior on MNIST data. Log Probability shown is computed with the entire ensemble until that epoch.
  • Figure 4: SA-SGLD's robustness with large stepsizes.

Theorems & Definitions (6)

  • Lemma 1: Uniform moment bounds
  • Theorem 1: Ergodicity and $O(h)$ bias
  • Lemma 1: Uniform moment bounds
  • proof
  • Theorem 1: Ergodicity and $O(h)$ bias
  • proof