Table of Contents
Fetching ...

Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin Dynamics

Daniel Paulin, Peter A. Whalley, Neil K. Chada, Benedict Leimkuhler

TL;DR

The paper tackles the challenge of scalable Bayesian posterior sampling for large neural networks by introducing SMS-UBU, a symmetric minibatch splitting integration of kinetic Langevin dynamics that retains second-order accuracy with a single minibatch per iteration. It proves Wasserstein contraction and $O(h^2)$-level error bounds, including $O(h^2 d^{1/2})$ bias under Hessian Lipschitz conditions, and demonstrates improved posterior calibration on multiple image datasets. A practical Bayesian UQ strategy via localized posteriors around SWA optima is proposed, enabling efficient exploration and ensemble-based uncertainty estimates. The empirical results show that SMS-UBU yields notable calibration gains over standard training and SWA, with scalable performance on CNNs and even wide CIFAR-10 nets, highlighting its potential for reliable uncertainty quantification in large-scale AI applications.

Abstract

We propose a scalable kinetic Langevin dynamics algorithm for sampling parameter spaces of big data and AI applications. Our scheme combines a symmetric forward/backward sweep over minibatches with a symmetric discretization of Langevin dynamics. For a particular Langevin splitting method (UBU), we show that the resulting Symmetric Minibatch Splitting-UBU (SMS-UBU) integrator has bias $O(h^2 d^{1/2})$ in dimension $d>0$ with stepsize $h>0$, despite only using one minibatch per iteration, thus providing excellent control of the sampling bias as a function of the stepsize. We apply the algorithm to explore local modes of the posterior distribution of Bayesian neural networks (BNNs) and evaluate the calibration performance of the posterior predictive probabilities for neural networks with convolutional neural network architectures for classification problems on three different datasets (Fashion-MNIST, Celeb-A and chest X-ray). Our results indicate that BNNs sampled with SMS-UBU can offer significantly better calibration performance compared to standard methods of training and stochastic weight averaging.

Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin Dynamics

TL;DR

The paper tackles the challenge of scalable Bayesian posterior sampling for large neural networks by introducing SMS-UBU, a symmetric minibatch splitting integration of kinetic Langevin dynamics that retains second-order accuracy with a single minibatch per iteration. It proves Wasserstein contraction and -level error bounds, including bias under Hessian Lipschitz conditions, and demonstrates improved posterior calibration on multiple image datasets. A practical Bayesian UQ strategy via localized posteriors around SWA optima is proposed, enabling efficient exploration and ensemble-based uncertainty estimates. The empirical results show that SMS-UBU yields notable calibration gains over standard training and SWA, with scalable performance on CNNs and even wide CIFAR-10 nets, highlighting its potential for reliable uncertainty quantification in large-scale AI applications.

Abstract

We propose a scalable kinetic Langevin dynamics algorithm for sampling parameter spaces of big data and AI applications. Our scheme combines a symmetric forward/backward sweep over minibatches with a symmetric discretization of Langevin dynamics. For a particular Langevin splitting method (UBU), we show that the resulting Symmetric Minibatch Splitting-UBU (SMS-UBU) integrator has bias in dimension with stepsize , despite only using one minibatch per iteration, thus providing excellent control of the sampling bias as a function of the stepsize. We apply the algorithm to explore local modes of the posterior distribution of Bayesian neural networks (BNNs) and evaluate the calibration performance of the posterior predictive probabilities for neural networks with convolutional neural network architectures for classification problems on three different datasets (Fashion-MNIST, Celeb-A and chest X-ray). Our results indicate that BNNs sampled with SMS-UBU can offer significantly better calibration performance compared to standard methods of training and stochastic weight averaging.

Paper Structure

This paper contains 23 sections, 11 theorems, 88 equations, 7 figures, 4 tables, 7 algorithms.

Key Result

Theorem 5

Consider the SMS-UBU scheme with friction parameter $\gamma >0$, stepsize $h>0$ and initial measure $\overline{\pi}_{0}$ and assume that $h<\frac{1}{2\gamma}$ and $\gamma \geq \sqrt{8M}$. Let the potential $f$ be $M$-$\nabla$Lipschitz and of the form $f = \sum^{N_{m}}_{i=1}f_{i}$, where each $f_{i}$ and if $f$ is $M^{s}_{1}$-strongly Hessian Lipschitz we have that If we impose the stronger stepsi

Figures (7)

  • Figure 1: Wasserstein bias of stochastic integrators of kinetic Langevin dynamics for a 1D Gaussian target
  • Figure 2: Top: average bias of probability of true class on test dataset for five integrators as a function of stepsize. The other plots show the effect of stepsize on accuracy and calibration performance (NLL, ACE, and RPS).
  • Figure 3: Left: trace plots for 4 SMS-UBU chains initialized at Gaussian perturbations of SWA weights. Right: Gelman-Rubin diagnostic $\hat{R}$ of average training loss computed for 16 SWA weights $x^*$ obtained independently (four parallel chains each). The four chains converge to the same level, and $\hat{R}$ values are close to unity, indicating excellent mixing.
  • Figure 4: Accuracy and calibration results for a CNN-based network on Fashion-MNIST.
  • Figure 5: Accuracy and calibration results for a CNN-based network for classifying brown/blonde hair colour on the Celeb-A dataset.
  • ...and 2 more figures

Theorems & Definitions (33)

  • Definition 1
  • Remark 1
  • Remark 2
  • Definition 2
  • Definition 3: Weighted Euclidean norm
  • Remark 3
  • Definition 4: $p$-Wasserstein distance
  • Remark 4
  • Theorem 5
  • Theorem 6
  • ...and 23 more