Table of Contents
Fetching ...

Bayesian Lottery Ticket Hypothesis

Nicholas Kuhn, Arvid Weyrauch, Lars Heyen, Achim Streit, Markus Götz, Charlotte Debus

TL;DR

The paper addresses the computational burden of Bayesian neural networks (BNNs) by exploring whether the Lottery Ticket Hypothesis (LTH) holds in Bayesian settings. It translates Iterative Magnitude Pruning (IMP) to mean-field variational Bayes across CNNs and a Vision Transformer on CIFAR-10, comparing Bayesian tickets to deterministic baselines and examining transplantation of non-Bayesian tickets into BNNs. Key findings show that Bayesian lottery tickets exist across architectures, with deeper layers pruned more heavily and pruning strategies that emphasize mean magnitude and uncertainty outperforming others in high sparsity; a transplantation approach can reduce training time by 3–7x while maintaining calibration. This suggests sparse Bayesian training is feasible and calibration-friendly, offering practical routes to efficient uncertainty-aware models, especially when computational resources are limited.

Abstract

Bayesian neural networks (BNNs) are a useful tool for uncertainty quantification, but require substantially more computational resources than conventional neural networks. For non-Bayesian networks, the Lottery Ticket Hypothesis (LTH) posits the existence of sparse subnetworks that can train to the same or even surpassing accuracy as the original dense network. Such sparse networks can lower the demand for computational resources at inference, and during training. The existence of the LTH and corresponding sparse subnetworks in BNNs could motivate the development of sparse training algorithms and provide valuable insights into the underlying training process. Towards this end, we translate the LTH experiments to a Bayesian setting using common computer vision models. We investigate the defining characteristics of Bayesian lottery tickets, and extend our study towards a transplantation method connecting BNNs with deterministic Lottery Tickets. We generally find that the LTH holds in BNNs, and winning tickets of matching and surpassing accuracy are present independent of model size, with degradation at very high sparsities. However, the pruning strategy should rely primarily on magnitude, secondly on standard deviation. Furthermore, our results demonstrate that models rely on mask structure and weight initialization to varying degrees.

Bayesian Lottery Ticket Hypothesis

TL;DR

The paper addresses the computational burden of Bayesian neural networks (BNNs) by exploring whether the Lottery Ticket Hypothesis (LTH) holds in Bayesian settings. It translates Iterative Magnitude Pruning (IMP) to mean-field variational Bayes across CNNs and a Vision Transformer on CIFAR-10, comparing Bayesian tickets to deterministic baselines and examining transplantation of non-Bayesian tickets into BNNs. Key findings show that Bayesian lottery tickets exist across architectures, with deeper layers pruned more heavily and pruning strategies that emphasize mean magnitude and uncertainty outperforming others in high sparsity; a transplantation approach can reduce training time by 3–7x while maintaining calibration. This suggests sparse Bayesian training is feasible and calibration-friendly, offering practical routes to efficient uncertainty-aware models, especially when computational resources are limited.

Abstract

Bayesian neural networks (BNNs) are a useful tool for uncertainty quantification, but require substantially more computational resources than conventional neural networks. For non-Bayesian networks, the Lottery Ticket Hypothesis (LTH) posits the existence of sparse subnetworks that can train to the same or even surpassing accuracy as the original dense network. Such sparse networks can lower the demand for computational resources at inference, and during training. The existence of the LTH and corresponding sparse subnetworks in BNNs could motivate the development of sparse training algorithms and provide valuable insights into the underlying training process. Towards this end, we translate the LTH experiments to a Bayesian setting using common computer vision models. We investigate the defining characteristics of Bayesian lottery tickets, and extend our study towards a transplantation method connecting BNNs with deterministic Lottery Tickets. We generally find that the LTH holds in BNNs, and winning tickets of matching and surpassing accuracy are present independent of model size, with degradation at very high sparsities. However, the pruning strategy should rely primarily on magnitude, secondly on standard deviation. Furthermore, our results demonstrate that models rely on mask structure and weight initialization to varying degrees.
Paper Structure (20 sections, 4 equations, 9 figures, 1 table)

This paper contains 20 sections, 4 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Test accuracy and MACE vs. percentage of weights remaining for ResNet18, VGG11 and ViT-tiny models trained on CIFAR10. Shown in color are the different scoring functions for pruning; in black are the non-Bayesian models pruned by magnitude. The x-axis is shown on a logarithmic scale; the y-axis is adjusted to include the full range of observed accuracies.
  • Figure 2: Layer-wise sparsity vs. layer index for ResNet (left), VGG (center), and ViT (right) training and pruning 20 times. The x-axis denotes the layer index, and the y-axis shows the resulting sparsity in each layer. Each plot shows the non-Bayesian model and the Bayesian model.
  • Figure 3: Visualization of the different randomizations applied to the weights (magnitude symbolized through color) and masks (binary, i.e. black and white) of a winning ticket.
  • Figure 4: Test accuracy vs. percentage of weights remaining (log scale) for ResNet (left), VGG (middle), and ViT (right) trained on CIFAR10. The black line denotes the original lottery ticket obtained through IMP, the colored lines represent the accuracy achieved by reinitializing or shuffling and training the ticket of that sparsity level. The Bayesian and non-Bayesian models are shown in separate plots.
  • Figure 5: Test accuracy vs. percentage of weights remaining (log scale) for ResNet (left), VGG (middle), and ViT (right) trained on CIFAR10. Shown are LTs generated through deterministic IMP, a fully Bayesian approach, and the transplantation method.
  • ...and 4 more figures