A Unified Evaluation Framework for Epistemic Predictions
Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang, Fabio Cuzzolin
TL;DR
The paper tackles the problem of comparing uncertainty-aware classifiers that produce diverse epistemic predictions by introducing a unified evaluation framework. It maps all prediction types to credal sets within the probability simplex and defines a metric $\mathcal{E} = d(y,\hat{y}) + \lambda \cdot NS[m]$ that blends accuracy and imprecision, with $d$ instantiated as $D_{KL}$ to the credal-set boundary. The approach enables cross-model ranking and tailored model selection for real-world tasks, validated on CIFAR-10, CIFAR-100, and MNIST across multiple uncertainty paradigms; it also provides a practical credal-set construction via coherent lower probabilities and Möbius inversion. The framework supports application-driven decisions (e.g., abstention vs mandatory action) and offers insights into trade-offs, limitations, and future directions such as training-time loss formulations that optimize the proposed metric.
Abstract
Predictions of uncertainty-aware models are diverse, ranging from single point estimates (often averaged over prediction samples) to predictive distributions, to set-valued or credal-set representations. We propose a novel unified evaluation framework for uncertainty-aware classifiers, applicable to a wide range of model classes, which allows users to tailor the trade-off between accuracy and precision of predictions via a suitably designed performance metric. This makes possible the selection of the most suitable model for a particular real-world application as a function of the desired trade-off. Our experiments, concerning Bayesian, ensemble, evidential, deterministic, credal and belief function classifiers on the CIFAR-10, MNIST and CIFAR-100 datasets, show that the metric behaves as desired.
