The Epistemic Uncertainty Hole: an issue of Bayesian Neural Networks
Mohammed Fellaji, Frédéric Pennerath
TL;DR
The paper identifies an epistemic uncertainty hole in Bayesian Deep Learning, where epistemic uncertainty $U_{epist}$ collapses for large models or certain data regimes, contrary to theoretical expectations. It formalizes the uncertainty components as $U_{total}$, $U_{aleat}$, and $U_{epist}=U_{total}-U_{aleat}$ with $U_{epist}=I(Y;W|X,D)$ and demonstrates that $\bar{U}_{epist}$ can decrease with increasing model capacity or be nonmonotonic with data size. Through motivating experiments on CIFAR10 with ResNet18 ensembles and a two-dimensional analysis using MLPs on MNIST and CIFAR10 with ensembles and MC-Dropout, the authors reveal a diagonal hole where epistemic uncertainty is unexpectedly low in regions of large models or limited data, and they show that this hole can undermine OOD detection, sometimes producing negative $\Delta \bar{U}$ and AUCs below chance. The findings challenge the practical utility of BDL in safety-critical tasks, motivating further work to diagnose the hole and devise corrective strategies to maintain informative epistemic cues. Overall, the work highlights a crucial gap between Bayesian theory and empirical behavior in deep models and its implications for OOD detection and calibration.
Abstract
Bayesian Deep Learning (BDL) gives access not only to aleatoric uncertainty, as standard neural networks already do, but also to epistemic uncertainty, a measure of confidence a model has in its own predictions. In this article, we show through experiments that the evolution of epistemic uncertainty metrics regarding the model size and the size of the training set, goes against theoretical expectations. More precisely, we observe that the epistemic uncertainty collapses literally in the presence of large models and sometimes also of little training data, while we expect the exact opposite behaviour. This phenomenon, which we call "epistemic uncertainty hole", is all the more problematic as it undermines the entire applicative potential of BDL, which is based precisely on the use of epistemic uncertainty. As an example, we evaluate the practical consequences of this uncertainty hole on one of the main applications of BDL, namely the detection of out-of-distribution samples
