Table of Contents
Fetching ...

Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo

Hyunsu Kim, Giung Nam, Chulhee Yun, Hongseok Yang, Juho Lee

TL;DR

This work tackles the limited sample diversity of stochastic gradient MCMC methods in Bayesian neural networks by introducing Parameter Expanded SGMCMC (PX-SGMCMC), which reparameterizes each weight matrix as W^{(l)} = P^{(l)} V^{(l)} Q^{(l)} to precondition gradient updates. Theoretical analysis links the depth of the expansion and the singular-value dynamics to improved exploration, while extensive experiments on CIFAR-10 and OOD benchmarks show enhanced posterior diversity, better uncertainty estimation, and robustness comparable to HMC without extra inference cost. The approach yields faster mixing and stronger ensemble performance under distribution shifts and data augmentations, offering a scalable, practical method for boosting Bayesian deep learning quality. The paper also discusses architectural adaptations and empirical observations of loss landscapes and trajectory diversity, highlighting the potential for broader application of parameter expansion in SGMCMC.

Abstract

Bayesian Neural Networks (BNNs) provide a promising framework for modeling predictive uncertainty and enhancing out-of-distribution robustness (OOD) by estimating the posterior distribution of network parameters. Stochastic Gradient Markov Chain Monte Carlo (SGMCMC) is one of the most powerful methods for scalable posterior sampling in BNNs, achieving efficiency by combining stochastic gradient descent with second-order Langevin dynamics. However, SGMCMC often suffers from limited sample diversity in practice, which affects uncertainty estimation and model performance. We propose a simple yet effective approach to enhance sample diversity in SGMCMC without the need for tempering or running multiple chains. Our approach reparameterizes the neural network by decomposing each of its weight matrices into a product of matrices, resulting in a sampling trajectory that better explores the target parameter space. This approach produces a more diverse set of samples, allowing faster mixing within the same computational budget. Notably, our sampler achieves these improvements without increasing the inference cost compared to the standard SGMCMC. Extensive experiments on image classification tasks, including OOD robustness, diversity, loss surface analyses, and a comparative study with Hamiltonian Monte Carlo, demonstrate the superiority of the proposed approach.

Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo

TL;DR

This work tackles the limited sample diversity of stochastic gradient MCMC methods in Bayesian neural networks by introducing Parameter Expanded SGMCMC (PX-SGMCMC), which reparameterizes each weight matrix as W^{(l)} = P^{(l)} V^{(l)} Q^{(l)} to precondition gradient updates. Theoretical analysis links the depth of the expansion and the singular-value dynamics to improved exploration, while extensive experiments on CIFAR-10 and OOD benchmarks show enhanced posterior diversity, better uncertainty estimation, and robustness comparable to HMC without extra inference cost. The approach yields faster mixing and stronger ensemble performance under distribution shifts and data augmentations, offering a scalable, practical method for boosting Bayesian deep learning quality. The paper also discusses architectural adaptations and empirical observations of loss landscapes and trajectory diversity, highlighting the potential for broader application of parameter expansion in SGMCMC.

Abstract

Bayesian Neural Networks (BNNs) provide a promising framework for modeling predictive uncertainty and enhancing out-of-distribution robustness (OOD) by estimating the posterior distribution of network parameters. Stochastic Gradient Markov Chain Monte Carlo (SGMCMC) is one of the most powerful methods for scalable posterior sampling in BNNs, achieving efficiency by combining stochastic gradient descent with second-order Langevin dynamics. However, SGMCMC often suffers from limited sample diversity in practice, which affects uncertainty estimation and model performance. We propose a simple yet effective approach to enhance sample diversity in SGMCMC without the need for tempering or running multiple chains. Our approach reparameterizes the neural network by decomposing each of its weight matrices into a product of matrices, resulting in a sampling trajectory that better explores the target parameter space. This approach produces a more diverse set of samples, allowing faster mixing within the same computational budget. Notably, our sampler achieves these improvements without increasing the inference cost compared to the standard SGMCMC. Extensive experiments on image classification tasks, including OOD robustness, diversity, loss surface analyses, and a comparative study with Hamiltonian Monte Carlo, demonstrate the superiority of the proposed approach.

Paper Structure

This paper contains 39 sections, 2 theorems, 33 equations, 11 figures, 11 tables, 4 algorithms.

Key Result

Lemma 3.1

For an arbitrary function ${\mathcal{F}}$ whose parameter is $\mathbf{W}_{1:e}=\mathbf{W}_1\cdots\mathbf{W}_e$ with its vectorization $\mathbf{X}=\mathrm{vec}\left(\mathbf{W}_{1:e}\right)$, assume that the gradient update of each $\mathbf{W}_i$ for $i\in\{1,\dots, e\}$ is defined as the following PD Then, their multiplication $\mathbf{X}$ satisfies the following dynamics: The operator $\otimes$ r

Figures (11)

  • Figure 1: Toy results. HMC samples with SP and EP. The color represents the negative log probability.
  • Figure 2: Connection between exploration and singular value dynamics. The first and second plots illustrate exploration through unnormalized and normalized Euclidean distances, while the third to fifth plots depict singular value dynamics, represented by the largest and smallest singular values and condition numbers. For the 21 layers, the singular value plots feature 21 transparent lines for each item, with the maximum (or minimum) value highlighted as the representative.
  • Figure 3: Trace plots for EP. It depicts training and validation errors along with trajectory.
  • Figure 4: Loss landscape analysis using HMC checkpoints. We visualize (a) linear connectivity between consecutive posterior samples and (b) a two-dimensional subspace spanned by the 0th (diamond), 1st (circle), and 2nd (pentagon) posterior samples. Both plots depict classification error on 1,000 training examples. Note that the 8th HMC sample was rejected and reverted to the 7th.
  • Figure 5: Supplementary box-and-whisker plots for CIFAR-10-C. It illustrates the classification error (ERR), negative log-likelihood (NLL), and expected calibration error (ECE) across 19 corruption types for five intensity levels.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Lemma 3.1: Dynamics of EP
  • Theorem 3.2: Exploration
  • proof
  • proof