Why Machine Learning Models Fail to Fully Capture Epistemic Uncertainty
Sebastián Jiménez, Mira Jürgens, Willem Waegeman
TL;DR
This paper tackles the mismatch between common second-order uncertainty methods and the full epistemic uncertainty in ML models by introducing a fine-grained taxonomy and a simulation-based evaluation framework using a reference distribution that accounts for data and procedural randomness. It provides a regression-specific bias-variance decomposition within this framework and demonstrates that high model bias can cause underestimation of epistemic uncertainty, with bias often being misattributed to aleatoric uncertainty by many methods. Through synthetic experiments and a real NYC taxi dataset, the authors show that typical approaches (e.g., Deep Ensembles) predominantly capture procedural uncertainty and fail to represent data-driven epistemic components, leading to distorted uncertainty partitions. The work highlights the need for task-aware evaluation protocols and full representation of all epistemic sources to obtain reliable and interpretable uncertainty estimates for downstream tasks like active learning and out-of-distribution detection.
Abstract
In recent years various supervised learning methods that disentangle aleatoric and epistemic uncertainty based on second-order distributions have been proposed. We argue that these methods fail to capture critical components of epistemic uncertainty, particularly due to the often-neglected component of model bias. To show this, we make use of a more fine-grained taxonomy of epistemic uncertainty sources in machine learning models, and analyse how the classical bias-variance decomposition of the expected prediction error can be decomposed into different parts reflecting these uncertainties. By using a simulation-based evaluation protocol which encompasses epistemic uncertainty due to both procedural- and data-driven uncertainty components, we illustrate that current methods rarely capture the full spectrum of epistemic uncertainty. Through theoretical insights and synthetic experiments, we show that high model bias can lead to misleadingly low estimates of epistemic uncertainty, and common second-order uncertainty quantification methods systematically blur bias-induced errors into aleatoric estimates, thereby underrepresenting epistemic uncertainty. Our findings underscore that meaningful aleatoric estimates are feasible only if all relevant sources of epistemic uncertainty are properly represented.
