Sequential Bayesian Neural Subnetwork Ensembles
Sanket Jantre, Shrijita Bhattacharya, Nathan M. Urban, Byung-Jun Yoon, Tapabrata Maiti, Prasanna Balaprakash, Sandeep Madireddy
TL;DR
The paper addresses the high cost and limited flexibility of traditional deep ensembles by introducing SeBayS, a sequential ensembling framework for Bayesian neural networks that maintains full sparsity throughout training. It combines a one-shot exploration phase with multiple exploitation phases to generate diverse Bayesian subnetworks in a single forward pass, using a prune-grow mechanism and a sparsity-aware VI objective. Empirically, SeBayS and its BayS variant outperform dense and sparse baselines in accuracy, calibration, OoD detection, and adversarial robustness on CIFAR-10/100 with Wide ResNet-28-10, while reducing training and memory costs. The approach offers a scalable, robust ensemble technique for uncertainty estimation, with potential extensions to structured sparsity and energy-efficient uncertainty frameworks.
Abstract
Deep ensembles have emerged as a powerful technique for improving predictive performance and enhancing model robustness across various applications by leveraging model diversity. However, traditional deep ensemble methods are often computationally expensive and rely on deterministic models, which may limit their flexibility. Additionally, while sparse subnetworks of dense models have shown promise in matching the performance of their dense counterparts and even enhancing robustness, existing methods for inducing sparsity typically incur training costs comparable to those of training a single dense model, as they either gradually prune the network during training or apply thresholding post-training. In light of these challenges, we propose an approach for sequential ensembling of dynamic Bayesian neural subnetworks that consistently maintains reduced model complexity throughout the training process while generating diverse ensembles in a single forward pass. Our approach involves an initial exploration phase to identify high-performing regions within the parameter space, followed by multiple exploitation phases that take advantage of the compactness of the sparse model. These exploitation phases quickly converge to different minima in the energy landscape, corresponding to high-performing subnetworks that together form a diverse and robust ensemble. We empirically demonstrate that our proposed approach outperforms traditional dense and sparse deterministic and Bayesian ensemble models in terms of prediction accuracy, uncertainty estimation, out-of-distribution detection, and adversarial robustness.
