Table of Contents
Fetching ...

Stochastic Weight Sharing for Bayesian Neural Networks

Moule Lin, Shuhao Guan, Weipeng Jing, Goetz Botterweck, Andrea Patane

TL;DR

The paper tackles the computational and memory bottlenecks of Bayesian neural networks by introducing 2D Gaussian Bayesian Neural Networks (2DGBNN), a stochastic weight-sharing approach that represents weight uncertainty with a small set of 2D Gaussian components. It combines Gaussian Mixture Model clustering, Wasserstein-2 distance-based merging, and alpha-blending sampling within a Variational Inference framework to compress parameters while preserving accuracy and calibrated uncertainty on large architectures such as ResNet-101 and ViT across CIFAR and ImageNet. The approach yields up to roughly $50\times$ fewer trainable parameters and significant model-size reductions, while delivering competitive NLL and ECE metrics compared to state-of-the-art Bayesian methods, enabling practical Bayesian training for large-scale vision models. This work thus provides a scalable pathway to deploy uncertainty-aware deep models in edge and resource-constrained settings and lays groundwork for further integration of stochastic weight-sharing into fully Bayesian training.

Abstract

While offering a principled framework for uncertainty quantification in deep learning, the employment of Bayesian Neural Networks (BNNs) is still constrained by their increased computational requirements and the convergence difficulties when training very deep, state-of-the-art architectures. In this work, we reinterpret weight-sharing quantization techniques from a stochastic perspective in the context of training and inference with Bayesian Neural Networks (BNNs). Specifically, we leverage 2D adaptive Gaussian distributions, Wasserstein distance estimations, and alpha blending to encode the stochastic behaviour of a BNN in a lower dimensional, soft Gaussian representation. Through extensive empirical investigation, we demonstrate that our approach significantly reduces the computational overhead inherent in Bayesian learning by several orders of magnitude, enabling the efficient Bayesian training of large-scale models, such as ResNet-101 and Vision Transformer (VIT). On various computer vision benchmarks including CIFAR10, CIFAR100, and ImageNet1k. Our approach compresses model parameters by approximately 50x and reduces model size by 75, while achieving accuracy and uncertainty estimations comparable to the state-of-the-art.

Stochastic Weight Sharing for Bayesian Neural Networks

TL;DR

The paper tackles the computational and memory bottlenecks of Bayesian neural networks by introducing 2D Gaussian Bayesian Neural Networks (2DGBNN), a stochastic weight-sharing approach that represents weight uncertainty with a small set of 2D Gaussian components. It combines Gaussian Mixture Model clustering, Wasserstein-2 distance-based merging, and alpha-blending sampling within a Variational Inference framework to compress parameters while preserving accuracy and calibrated uncertainty on large architectures such as ResNet-101 and ViT across CIFAR and ImageNet. The approach yields up to roughly fewer trainable parameters and significant model-size reductions, while delivering competitive NLL and ECE metrics compared to state-of-the-art Bayesian methods, enabling practical Bayesian training for large-scale vision models. This work thus provides a scalable pathway to deploy uncertainty-aware deep models in edge and resource-constrained settings and lays groundwork for further integration of stochastic weight-sharing into fully Bayesian training.

Abstract

While offering a principled framework for uncertainty quantification in deep learning, the employment of Bayesian Neural Networks (BNNs) is still constrained by their increased computational requirements and the convergence difficulties when training very deep, state-of-the-art architectures. In this work, we reinterpret weight-sharing quantization techniques from a stochastic perspective in the context of training and inference with Bayesian Neural Networks (BNNs). Specifically, we leverage 2D adaptive Gaussian distributions, Wasserstein distance estimations, and alpha blending to encode the stochastic behaviour of a BNN in a lower dimensional, soft Gaussian representation. Through extensive empirical investigation, we demonstrate that our approach significantly reduces the computational overhead inherent in Bayesian learning by several orders of magnitude, enabling the efficient Bayesian training of large-scale models, such as ResNet-101 and Vision Transformer (VIT). On various computer vision benchmarks including CIFAR10, CIFAR100, and ImageNet1k. Our approach compresses model parameters by approximately 50x and reduces model size by 75, while achieving accuracy and uncertainty estimations comparable to the state-of-the-art.

Paper Structure

This paper contains 34 sections, 16 equations, 3 figures, 14 tables, 1 algorithm.

Figures (3)

  • Figure 1: Weight distribution for the BNN prior to stochastic-sharing. Panel (a) shows the density plot for the second convolutional layer of ResNet-50 when trained on ImageNet (in blue) and CIFAR-100 (in red). Panel (b) shows the corresponding scatter plots, including lines for the 1% and 99%.
  • Figure 2: Left: Posterior distribution of weights in ResNet-18 with CIFAR-10 using 2D Gaussian Bayesian Neural Network (2DGBNN), depicting Gaussian centers (blue), Ellipse weights (yellow), and Outlier weights (red). Exceptional weights (outliers). Right: Confidence distribution of predictions on the CIFAR-10 dataset. Mean confidence values are shown above each boxplot.
  • Figure 3: Initial 2D Gaussian of the second convolutional layer of ResNet-18 in CIFAR-10 dataset. Different colors represent the different clusters. The solid line is the first variance area, and the dotted line is the second variance area.