Data Subsampling for Bayesian Neural Networks
Eiji Kawasaki, Markus Holzmann, Lawrence Adu-Gyamfi
TL;DR
This paper tackles the scalability bottleneck of Bayesian posterior sampling for neural networks by introducing Penalty Bayesian Neural Networks (PBNNs), which use mini-batch likelihood evaluations together with a noise-penalty in the Metropolis–Hastings acceptance to achieve unbiased posterior sampling. By modeling the mini-batch loss difference as noisy, the method adds a penalty term that accounts for variance, enabling accurate sampling even with small batch sizes $n$ and multiple batches $M$. The authors develop a full algorithm (PBNN) and discuss extensions to Penalized Langevin Dynamics (PMLD) including MALA and ULA variants, providing a pathway to calibrate predictive distributions via $n$ and to deploy in federated settings where data are decentralized. Empirical results on synthetic tasks and MNIST demonstrate robust predictive performance and improved calibration (reduced overconfidence) when varying mini-batch sizes, highlighting the approach’s practical impact for scalable Bayesian UQ in deep learning.
Abstract
Markov Chain Monte Carlo (MCMC) algorithms do not scale well for large datasets leading to difficulties in Neural Network posterior sampling. In this paper, we propose Penalty Bayesian Neural Networks - PBNNs, as a new algorithm that allows the evaluation of the likelihood using subsampled batch data (mini-batches) in a Bayesian inference context towards addressing scalability. PBNN avoids the biases inherent in other naive subsampling techniques by incorporating a penalty term as part of a generalization of the Metropolis Hastings algorithm. We show that it is straightforward to integrate PBNN with existing MCMC frameworks, as the variance of the loss function merely reduces the acceptance probability. By comparing with alternative sampling strategies on both synthetic data and the MNIST dataset, we demonstrate that PBNN achieves good predictive performance even for small mini-batch sizes of data. We show that PBNN provides a novel approach for calibrating the predictive distribution by varying the mini-batch size, significantly reducing predictive overconfidence.
