Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification
Yuqi Li, Matthew M. Engelhard
TL;DR
This work addresses the limitation of standard ROC analysis for interval-valued predictions by introducing interval-based AUC measures, $AUC_L$ and $AUC_U$, and a three-region ROC decomposition that separates definitively correct, definitively incorrect, and uncertain rankings due to interval overlap. The authors establish probabilistic interpretations for $AUC_L$ and $AUC_U$, show their connection to the classical AUC, and prove bounds on the Bayes-optimal AUC $AUC^*$ under mild class-conditional coverage via the miscoverage probability $p_{\text{pair}}$. A new selective prediction concept, the uncertainty-aware AUC ($uAUC$), enables abstention on ambiguous pairs while preserving a high-quality discriminative core. Empirical validation on the Pima Indians Diabetes dataset demonstrates the accuracy of the theoretical results, the utility of the three-region perspective, and the practical value of uncertainty-aware evaluation for decision-making in high-stakes settings.
Abstract
In high-stakes risk prediction, quantifying uncertainty through interval-valued predictions is essential for reliable decision-making. However, standard evaluation tools like the receiver operating characteristic (ROC) curve and the area under the curve (AUC) are designed for point scores and fail to capture the impact of predictive uncertainty on ranking performance. We propose an uncertainty-aware ROC framework specifically for interval-valued predictions, introducing two new measures: $AUC_L$ and $AUC_U$. This framework enables an informative three-region decomposition of the ROC plane, partitioning pairwise rankings into correct, incorrect, and uncertain orderings. This approach naturally supports selective prediction by allowing models to abstain from ranking cases with overlapping intervals, thereby optimizing the trade-off between abstention rate and discriminative reliability. We prove that under valid class-conditional coverage, $AUC_L$ and $AUC_U$ provide formal lower and upper bounds on the theoretical optimal AUC ($AUC^*$), characterizing the physical limit of achievable discrimination. The proposed framework applies broadly to interval-valued prediction models, regardless of the interval construction method. Experiments on real-world benchmark datasets, using bootstrap-based intervals as one instantiation, validate the framework's correctness and demonstrate its practical utility for uncertainty-aware evaluation and decision-making.
