On Local Posterior Structure in Deep Ensembles
Mikkel Jordahn, Jonas Vestergaard Jensen, Mikkel N. Schmidt, Michael Riis Andersen
TL;DR
The paper investigates whether adding local posterior structure to deep ensembles (DE-BNNs) improves uncertainty quantification. Across multiple datasets and architectures, it finds that large DEs generally outperform DE-BNNs on in-distribution metrics, while DE-BNNs can offer out-of-distribution gains at an in-distribution cost. The study evaluates SWAG, Last-Layer Laplace Approximation (LLLA), and LA-NF as post-hoc local-posterior methods and performs extensive sensitivity analyses, concluding that DEs are often the pragmatically preferable choice for large ensembles. It also provides practical guidance on when DE-BNNs may be preferable and open-sources a large set of trained models for further research.
Abstract
Bayesian Neural Networks (BNNs) often improve model calibration and predictive uncertainty quantification compared to point estimators such as maximum-a-posteriori (MAP). Similarly, deep ensembles (DEs) are also known to improve calibration, and therefore, it is natural to hypothesize that deep ensembles of BNNs (DE-BNNs) should provide even further improvements. In this work, we systematically investigate this across a number of datasets, neural network architectures, and BNN approximation methods and surprisingly find that when the ensembles grow large enough, DEs consistently outperform DE-BNNs on in-distribution data. To shine light on this observation, we conduct several sensitivity and ablation studies. Moreover, we show that even though DE-BNNs outperform DEs on out-of-distribution metrics, this comes at the cost of decreased in-distribution performance. As a final contribution, we open-source the large pool of trained models to facilitate further research on this topic.
