Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo
Hyunsu Kim, Giung Nam, Chulhee Yun, Hongseok Yang, Juho Lee
TL;DR
This work tackles the limited sample diversity of stochastic gradient MCMC methods in Bayesian neural networks by introducing Parameter Expanded SGMCMC (PX-SGMCMC), which reparameterizes each weight matrix as W^{(l)} = P^{(l)} V^{(l)} Q^{(l)} to precondition gradient updates. Theoretical analysis links the depth of the expansion and the singular-value dynamics to improved exploration, while extensive experiments on CIFAR-10 and OOD benchmarks show enhanced posterior diversity, better uncertainty estimation, and robustness comparable to HMC without extra inference cost. The approach yields faster mixing and stronger ensemble performance under distribution shifts and data augmentations, offering a scalable, practical method for boosting Bayesian deep learning quality. The paper also discusses architectural adaptations and empirical observations of loss landscapes and trajectory diversity, highlighting the potential for broader application of parameter expansion in SGMCMC.
Abstract
Bayesian Neural Networks (BNNs) provide a promising framework for modeling predictive uncertainty and enhancing out-of-distribution robustness (OOD) by estimating the posterior distribution of network parameters. Stochastic Gradient Markov Chain Monte Carlo (SGMCMC) is one of the most powerful methods for scalable posterior sampling in BNNs, achieving efficiency by combining stochastic gradient descent with second-order Langevin dynamics. However, SGMCMC often suffers from limited sample diversity in practice, which affects uncertainty estimation and model performance. We propose a simple yet effective approach to enhance sample diversity in SGMCMC without the need for tempering or running multiple chains. Our approach reparameterizes the neural network by decomposing each of its weight matrices into a product of matrices, resulting in a sampling trajectory that better explores the target parameter space. This approach produces a more diverse set of samples, allowing faster mixing within the same computational budget. Notably, our sampler achieves these improvements without increasing the inference cost compared to the standard SGMCMC. Extensive experiments on image classification tasks, including OOD robustness, diversity, loss surface analyses, and a comparative study with Hamiltonian Monte Carlo, demonstrate the superiority of the proposed approach.
