A Data-Driven Measure of Relative Uncertainty for Misclassification Detection

Eduardo Dadalto; Marco Romanelli; Georg Pichler; Pablo Piantanida

A Data-Driven Measure of Relative Uncertainty for Misclassification Detection

Eduardo Dadalto, Marco Romanelli, Georg Pichler, Pablo Piantanida

TL;DR

This work addresses misclassification detection by arguing that traditional uncertainty measures like Shannon entropy fail to capture real uncertainty in model predictions. It introduces Rel-U, a data-driven, observer-specific uncertainty metric defined as $s_{\textsc{Rel-U}}(\mathbf{x}) = \hat{\mathbf{p}}(\mathbf{x}) D^* \hat{\mathbf{p}}(\mathbf{x})^{\top}$, where the distance matrix $D^*$ is learned from positive and negative samples via a closed-form solution. The approach yields a flexible, Rao-diversity–inspired framework that discriminates correctly classified versus misclassified samples and is validated on matched and mismatched image classification tasks, often outperforming state-of-the-art detectors like MSP, ODIN, and Doctor. Rel-U demonstrates strong detection performance and robustness to calibration and distribution shifts, suggesting practical utility in safety-critical ML applications. The work also provides interpretability through the learned uncertainty matrix $D$, which reveals fine-grained class-pair dependencies beyond conventional entropy measures.

Abstract

Misclassification detection is an important problem in machine learning, as it allows for the identification of instances where the model's predictions are unreliable. However, conventional uncertainty measures such as Shannon entropy do not provide an effective way to infer the real uncertainty associated with the model's predictions. In this paper, we introduce a novel data-driven measure of uncertainty relative to an observer for misclassification detection. By learning patterns in the distribution of soft-predictions, our uncertainty measure can identify misclassified samples based on the predicted class probabilities. Interestingly, according to the proposed measure, soft-predictions corresponding to misclassified instances can carry a large amount of uncertainty, even though they may have low Shannon entropy. We demonstrate empirical improvements over multiple image classification tasks, outperforming state-of-the-art misclassification detection methods.

A Data-Driven Measure of Relative Uncertainty for Misclassification Detection

TL;DR

, where the distance matrix

is learned from positive and negative samples via a closed-form solution. The approach yields a flexible, Rao-diversity–inspired framework that discriminates correctly classified versus misclassified samples and is validated on matched and mismatched image classification tasks, often outperforming state-of-the-art detectors like MSP, ODIN, and Doctor. Rel-U demonstrates strong detection performance and robustness to calibration and distribution shifts, suggesting practical utility in safety-critical ML applications. The work also provides interpretability through the learned uncertainty matrix

, which reveals fine-grained class-pair dependencies beyond conventional entropy measures.

Abstract

Paper Structure (34 sections, 1 theorem, 20 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 34 sections, 1 theorem, 20 equations, 7 figures, 5 tables, 1 algorithm.

Introduction
Related Works
A Data-Driven Measure of Uncertainty
From Uncertainty to Misclassification Detection
Misclassification Detection Background
A Data-Driven Measure of Relative Uncertainty for Model’s Predictions
Experiments and Discussion
Misclassification Detection on Matched Data
Mismatched Data
Empirical Interpretation of the Relative Uncertainty Matrix.
Summary and Concluding Remarks
Appendix
Proof of \ref{['prop:one']}
Algorithm
Details on Baselines and Benchmarks
...and 19 more sections

Key Result

Proposition 1

The constrained optimization problem defined in eq:opt_problem admits a closed form solution $D^* = \frac{1}{Z} (d^*_{ij})$, where The multiplicative constant $Z$ is chosen such that $D^*$ satisfies the condition $\mathop{\mathrm{Tr}}\nolimits(D^* (D^*)^{\top}) = K$.

Figures (7)

Figure 1: Intuitive example illustrating the advantage of Rel-U compared to entropy-based methods: Rel-U (left-end side heatmap) captures the real uncertainty (central heatmap) much better than Doctor GraneseRGPP2021NeurIPS; a detailed analysis is provided in \ref{['subseq:empirical_interpretation']}.
Figure 2: Impact of the tuning split size on the misclassification performance on a ResNet-34 model trained with supervised cross-entropy loss for our method and the Doctor baseline. Hyperparameters are set to their default values ($T=1.0$, $\epsilon = 0.0$, and $\lambda = 0.5$), i.e., only the impact of the validation split size is observed.
Figure 3: Ablation studies for temperature, lambda, and noise magnitude effects. The x-axis represents the experimental conditions, while the y-axis shows the performance metric.
Figure 4: Impact of different validation set sizes (in percentage of test split) for mismatch detection.
Figure 5: CIFAR-10 vs CIFAR-10C, ResNet-34, using 10% of the test split for validation.
...and 2 more figures

Theorems & Definitions (4)

Definition 1
Proposition 1: Closed form solution
Remark
Remark

A Data-Driven Measure of Relative Uncertainty for Misclassification Detection

TL;DR

Abstract

A Data-Driven Measure of Relative Uncertainty for Misclassification Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (4)