Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized Tasks
Bálint Mucsányi, Michael Kirchhof, Seong Joon Oh
TL;DR
This work addresses the practical disentanglement of aleatoric and epistemic uncertainty by conducting the first large-scale benchmark of 19 QQU methods across 13 uncertainty tasks on ImageNet-1k and CIFAR-10. It rigorously tests two decomposition formulas (information-theoretic and Bregman) and a broad suite of distributional and deterministic estimators, using multiple aggregators and five seeds. The key finding is that none of the examined approaches truly disentangles the sources of uncertainty in practice; estimates are highly correlated, and task performance varies widely, indicating there is no one-size-fits-all solution. The study provides practical guidance on when to use specialized estimators per task, highlights opportunities for task-centric disentangled uncertainties, and emphasizes the need for broader ground-truth data for aleatoric uncertainty. All code, logs, and benchmarks are made available to support reproducibility and further research.
Abstract
Uncertainty quantification, once a singular task, has evolved into a spectrum of tasks, including abstained prediction, out-of-distribution detection, and aleatoric uncertainty quantification. The latest goal is disentanglement: the construction of multiple estimators that are each tailored to one and only one source of uncertainty. This paper presents the first benchmark of uncertainty disentanglement. We reimplement and evaluate a comprehensive range of uncertainty estimators, from Bayesian over evidential to deterministic ones, across a diverse range of uncertainty tasks on ImageNet. We find that, despite recent theoretical endeavors, no existing approach provides pairs of disentangled uncertainty estimators in practice. We further find that specialized uncertainty tasks are harder than predictive uncertainty tasks, where we observe saturating performance. Our results provide both practical advice for which uncertainty estimators to use for which specific task, and reveal opportunities for future research toward task-centric and disentangled uncertainties. All our reimplementations and Weights & Biases logs are available at https://github.com/bmucsanyi/untangle.
