Table of Contents
Fetching ...

A Compact Representation for Bayesian Neural Networks By Removing Permutation Symmetry

Tim Z. Xiao, Weiyang Liu, Robert Bamler

TL;DR

Permutation symmetry in neural networks makes weight-space posteriors multimodal and hard to summarize. The authors introduce $NoT$ to quantify permutation magnitude and apply rebasin to align samples, producing a compact posterior $q_r(W) = \mathcal N(\mu_r, \mathrm{diag}(\sigma_r^2))$ that enables direct weight-space comparisons across inference methods. They show that $NoT$ is stable under training and correlates with the loss barrier, and that $q_r(W)$ closely approximates the posterior compared with the direct-approximation $q_d(W)$, while enabling merging and pruning across models. This approach offers interpretable uncertainty, practical cross-method compatibility, and a pathway toward efficiency in Bayesian deep learning.

Abstract

Bayesian neural networks (BNNs) are a principled approach to modeling predictive uncertainties in deep learning, which are important in safety-critical applications. Since exact Bayesian inference over the weights in a BNN is intractable, various approximate inference methods exist, among which sampling methods such as Hamiltonian Monte Carlo (HMC) are often considered the gold standard. While HMC provides high-quality samples, it lacks interpretable summary statistics because its sample mean and variance is meaningless in neural networks due to permutation symmetry. In this paper, we first show that the role of permutations can be meaningfully quantified by a number of transpositions metric. We then show that the recently proposed rebasin method allows us to summarize HMC samples into a compact representation that provides a meaningful explicit uncertainty estimate for each weight in a neural network, thus unifying sampling methods with variational inference. We show that this compact representation allows us to compare trained BNNs directly in weight space across sampling methods and variational inference, and to efficiently prune neural networks trained without explicit Bayesian frameworks by exploiting uncertainty estimates from HMC.

A Compact Representation for Bayesian Neural Networks By Removing Permutation Symmetry

TL;DR

Permutation symmetry in neural networks makes weight-space posteriors multimodal and hard to summarize. The authors introduce to quantify permutation magnitude and apply rebasin to align samples, producing a compact posterior that enables direct weight-space comparisons across inference methods. They show that is stable under training and correlates with the loss barrier, and that closely approximates the posterior compared with the direct-approximation , while enabling merging and pruning across models. This approach offers interpretable uncertainty, practical cross-method compatibility, and a pathway toward efficiency in Bayesian deep learning.

Abstract

Bayesian neural networks (BNNs) are a principled approach to modeling predictive uncertainties in deep learning, which are important in safety-critical applications. Since exact Bayesian inference over the weights in a BNN is intractable, various approximate inference methods exist, among which sampling methods such as Hamiltonian Monte Carlo (HMC) are often considered the gold standard. While HMC provides high-quality samples, it lacks interpretable summary statistics because its sample mean and variance is meaningless in neural networks due to permutation symmetry. In this paper, we first show that the role of permutations can be meaningfully quantified by a number of transpositions metric. We then show that the recently proposed rebasin method allows us to summarize HMC samples into a compact representation that provides a meaningful explicit uncertainty estimate for each weight in a neural network, thus unifying sampling methods with variational inference. We show that this compact representation allows us to compare trained BNNs directly in weight space across sampling methods and variational inference, and to efficiently prune neural networks trained without explicit Bayesian frameworks by exploiting uncertainty estimates from HMC.
Paper Structure (8 sections, 1 equation, 3 figures, 1 table)

This paper contains 8 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Training dynamics for models with $\mathbf{W}_0$ and $\mathbf{W}_1$, and their interpolations $\mathbf W_\lambda$.
  • Figure 2: Left three: effect of permuting initial weights by different Number of Transpositions (NoT) on NoT after training, weight-space distance, and loss barrier (shaded regions: $\pm1\sigma$ over 5 runs). Right: NoT changes monotonically along the interpolation $\mathbf W_\lambda$ between two models $\mathbf{W}_0$ and $\mathbf{W}_1$.
  • Figure 3: Left: histograms of the standard deviation $\bm{\sigma}$ of weights before ($\bm{\sigma}_{\mathrm{d}}$) and after ($\bm{\sigma}_{\mathrm{r}}$) rebasin. Right: test accuracy vs. various levels of weight pruning (retaining only weights with lowest $\bm{\sigma}$).