Table of Contents
Fetching ...

The Epistemic Uncertainty Hole: an issue of Bayesian Neural Networks

Mohammed Fellaji, Frédéric Pennerath

TL;DR

The paper identifies an epistemic uncertainty hole in Bayesian Deep Learning, where epistemic uncertainty $U_{epist}$ collapses for large models or certain data regimes, contrary to theoretical expectations. It formalizes the uncertainty components as $U_{total}$, $U_{aleat}$, and $U_{epist}=U_{total}-U_{aleat}$ with $U_{epist}=I(Y;W|X,D)$ and demonstrates that $\bar{U}_{epist}$ can decrease with increasing model capacity or be nonmonotonic with data size. Through motivating experiments on CIFAR10 with ResNet18 ensembles and a two-dimensional analysis using MLPs on MNIST and CIFAR10 with ensembles and MC-Dropout, the authors reveal a diagonal hole where epistemic uncertainty is unexpectedly low in regions of large models or limited data, and they show that this hole can undermine OOD detection, sometimes producing negative $\Delta \bar{U}$ and AUCs below chance. The findings challenge the practical utility of BDL in safety-critical tasks, motivating further work to diagnose the hole and devise corrective strategies to maintain informative epistemic cues. Overall, the work highlights a crucial gap between Bayesian theory and empirical behavior in deep models and its implications for OOD detection and calibration.

Abstract

Bayesian Deep Learning (BDL) gives access not only to aleatoric uncertainty, as standard neural networks already do, but also to epistemic uncertainty, a measure of confidence a model has in its own predictions. In this article, we show through experiments that the evolution of epistemic uncertainty metrics regarding the model size and the size of the training set, goes against theoretical expectations. More precisely, we observe that the epistemic uncertainty collapses literally in the presence of large models and sometimes also of little training data, while we expect the exact opposite behaviour. This phenomenon, which we call "epistemic uncertainty hole", is all the more problematic as it undermines the entire applicative potential of BDL, which is based precisely on the use of epistemic uncertainty. As an example, we evaluate the practical consequences of this uncertainty hole on one of the main applications of BDL, namely the detection of out-of-distribution samples

The Epistemic Uncertainty Hole: an issue of Bayesian Neural Networks

TL;DR

The paper identifies an epistemic uncertainty hole in Bayesian Deep Learning, where epistemic uncertainty collapses for large models or certain data regimes, contrary to theoretical expectations. It formalizes the uncertainty components as , , and with and demonstrates that can decrease with increasing model capacity or be nonmonotonic with data size. Through motivating experiments on CIFAR10 with ResNet18 ensembles and a two-dimensional analysis using MLPs on MNIST and CIFAR10 with ensembles and MC-Dropout, the authors reveal a diagonal hole where epistemic uncertainty is unexpectedly low in regions of large models or limited data, and they show that this hole can undermine OOD detection, sometimes producing negative and AUCs below chance. The findings challenge the practical utility of BDL in safety-critical tasks, motivating further work to diagnose the hole and devise corrective strategies to maintain informative epistemic cues. Overall, the work highlights a crucial gap between Bayesian theory and empirical behavior in deep models and its implications for OOD detection and calibration.

Abstract

Bayesian Deep Learning (BDL) gives access not only to aleatoric uncertainty, as standard neural networks already do, but also to epistemic uncertainty, a measure of confidence a model has in its own predictions. In this article, we show through experiments that the evolution of epistemic uncertainty metrics regarding the model size and the size of the training set, goes against theoretical expectations. More precisely, we observe that the epistemic uncertainty collapses literally in the presence of large models and sometimes also of little training data, while we expect the exact opposite behaviour. This phenomenon, which we call "epistemic uncertainty hole", is all the more problematic as it undermines the entire applicative potential of BDL, which is based precisely on the use of epistemic uncertainty. As an example, we evaluate the practical consequences of this uncertainty hole on one of the main applications of BDL, namely the detection of out-of-distribution samples
Paper Structure (10 sections, 2 equations, 5 figures)

This paper contains 10 sections, 2 equations, 5 figures.

Figures (5)

  • Figure 1: Box-plots of the epistemic uncertainty for the evaluations of the ensemble of ResNet18 models trained on CIFAR10 and tested on the test set of CIFAR10 (ID: In-Distribution examples). On the x-axis we have the number of examples used to train the models and the normalized epistemic uncertainty on the y-axis. ID-mis represents the misclassified examples from the ID set, ID-all is for the entire ID set and ID-good are the ID examples correctly classified by the ensemble.
  • Figure 2: Heatmaps of the normalized epistemic uncertainties on ID samples: the test sets of MNIST (\ref{['fig:epis-uncer-id-mnist-ensemble']}, \ref{['fig:epis-uncer-id-mnist-mc-dropout']}) and CIFAR10 (\ref{['fig:epis-uncer-id-cifar10-ensemble']}, \ref{['fig:epis-uncer-id-cifar10-mc-dropout']}). On the x-axis we have the number of neurons in the hidden layers and the number of samples used to train the models on the y-axis. Only the average of each test is reported.
  • Figure 3: Heatmaps of the models accuracy on ID samples: the test sets of MNIST (\ref{['fig:accuracy-mnist-ensemble']}, \ref{['fig:accuracy-mnist-mc-dropout']}) and CIFAR10 (\ref{['fig:accuracy-cifar10-ensemble']}, \ref{['fig:accuracy-cifar10-mc-dropout']}). On the x-axis we have the number of neurons in the hidden layers and the number of samples used to train the models on the y-axis.
  • Figure 4: Difference between the normalized epistemic uncertainties on OOD samples and on ID samples. On the x-axis we have the number of neurons in the hidden layers and the number of samples used to train the models on the y-axis. Only the average of each test is reported.
  • Figure 5: AUC score, centered at $0.5$, based on the normalized epistemic uncertainty: $0$ for ID and $1$ for OOD.