A Data-Driven Measure of Relative Uncertainty for Misclassification Detection
Eduardo Dadalto, Marco Romanelli, Georg Pichler, Pablo Piantanida
TL;DR
This work addresses misclassification detection by arguing that traditional uncertainty measures like Shannon entropy fail to capture real uncertainty in model predictions. It introduces Rel-U, a data-driven, observer-specific uncertainty metric defined as $s_{\textsc{Rel-U}}(\mathbf{x}) = \hat{\mathbf{p}}(\mathbf{x}) D^* \hat{\mathbf{p}}(\mathbf{x})^{\top}$, where the distance matrix $D^*$ is learned from positive and negative samples via a closed-form solution. The approach yields a flexible, Rao-diversity–inspired framework that discriminates correctly classified versus misclassified samples and is validated on matched and mismatched image classification tasks, often outperforming state-of-the-art detectors like MSP, ODIN, and Doctor. Rel-U demonstrates strong detection performance and robustness to calibration and distribution shifts, suggesting practical utility in safety-critical ML applications. The work also provides interpretability through the learned uncertainty matrix $D$, which reveals fine-grained class-pair dependencies beyond conventional entropy measures.
Abstract
Misclassification detection is an important problem in machine learning, as it allows for the identification of instances where the model's predictions are unreliable. However, conventional uncertainty measures such as Shannon entropy do not provide an effective way to infer the real uncertainty associated with the model's predictions. In this paper, we introduce a novel data-driven measure of uncertainty relative to an observer for misclassification detection. By learning patterns in the distribution of soft-predictions, our uncertainty measure can identify misclassified samples based on the predicted class probabilities. Interestingly, according to the proposed measure, soft-predictions corresponding to misclassified instances can carry a large amount of uncertainty, even though they may have low Shannon entropy. We demonstrate empirical improvements over multiple image classification tasks, outperforming state-of-the-art misclassification detection methods.
