Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification

Yuqi Li; Matthew M. Engelhard

Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification

Yuqi Li, Matthew M. Engelhard

TL;DR

This work addresses the limitation of standard ROC analysis for interval-valued predictions by introducing interval-based AUC measures, $AUC_L$ and $AUC_U$, and a three-region ROC decomposition that separates definitively correct, definitively incorrect, and uncertain rankings due to interval overlap. The authors establish probabilistic interpretations for $AUC_L$ and $AUC_U$, show their connection to the classical AUC, and prove bounds on the Bayes-optimal AUC $AUC^*$ under mild class-conditional coverage via the miscoverage probability $p_{\text{pair}}$. A new selective prediction concept, the uncertainty-aware AUC ($uAUC$), enables abstention on ambiguous pairs while preserving a high-quality discriminative core. Empirical validation on the Pima Indians Diabetes dataset demonstrates the accuracy of the theoretical results, the utility of the three-region perspective, and the practical value of uncertainty-aware evaluation for decision-making in high-stakes settings.

Abstract

In high-stakes risk prediction, quantifying uncertainty through interval-valued predictions is essential for reliable decision-making. However, standard evaluation tools like the receiver operating characteristic (ROC) curve and the area under the curve (AUC) are designed for point scores and fail to capture the impact of predictive uncertainty on ranking performance. We propose an uncertainty-aware ROC framework specifically for interval-valued predictions, introducing two new measures: $AUC_L$ and $AUC_U$. This framework enables an informative three-region decomposition of the ROC plane, partitioning pairwise rankings into correct, incorrect, and uncertain orderings. This approach naturally supports selective prediction by allowing models to abstain from ranking cases with overlapping intervals, thereby optimizing the trade-off between abstention rate and discriminative reliability. We prove that under valid class-conditional coverage, $AUC_L$ and $AUC_U$ provide formal lower and upper bounds on the theoretical optimal AUC ($AUC^*$), characterizing the physical limit of achievable discrimination. The proposed framework applies broadly to interval-valued prediction models, regardless of the interval construction method. Experiments on real-world benchmark datasets, using bootstrap-based intervals as one instantiation, validate the framework's correctness and demonstrate its practical utility for uncertainty-aware evaluation and decision-making.

Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification

TL;DR

This work addresses the limitation of standard ROC analysis for interval-valued predictions by introducing interval-based AUC measures,

and

, and a three-region ROC decomposition that separates definitively correct, definitively incorrect, and uncertain rankings due to interval overlap. The authors establish probabilistic interpretations for

and

, show their connection to the classical AUC, and prove bounds on the Bayes-optimal AUC

under mild class-conditional coverage via the miscoverage probability

. A new selective prediction concept, the uncertainty-aware AUC (

), enables abstention on ambiguous pairs while preserving a high-quality discriminative core. Empirical validation on the Pima Indians Diabetes dataset demonstrates the accuracy of the theoretical results, the utility of the three-region perspective, and the practical value of uncertainty-aware evaluation for decision-making in high-stakes settings.

Abstract

and

. This framework enables an informative three-region decomposition of the ROC plane, partitioning pairwise rankings into correct, incorrect, and uncertain orderings. This approach naturally supports selective prediction by allowing models to abstain from ranking cases with overlapping intervals, thereby optimizing the trade-off between abstention rate and discriminative reliability. We prove that under valid class-conditional coverage,

and

provide formal lower and upper bounds on the theoretical optimal AUC (

), characterizing the physical limit of achievable discrimination. The proposed framework applies broadly to interval-valued prediction models, regardless of the interval construction method. Experiments on real-world benchmark datasets, using bootstrap-based intervals as one instantiation, validate the framework's correctness and demonstrate its practical utility for uncertainty-aware evaluation and decision-making.

Paper Structure (30 sections, 5 theorems, 26 equations, 5 figures, 2 tables)

This paper contains 30 sections, 5 theorems, 26 equations, 5 figures, 2 tables.

Introduction
Related Work
Classical ROC Analysis and AUC.
Methods Producing Interval-Valued Predictions.
Evaluation of Prediction Intervals.
Selective Prediction and Abstention.
Research Gap and Our Contribution.
Setup and Theory
Problem Setup and Interval Comparisons
Two ROC-Style Curves and Interval-Based AUC Quantities
Main Theoretical Results
Geometric Interpretation
Connection to Classical AUC
Selective Prediction and Uncertainty-Aware AUC
Estimating the Bounds of the Optimal AUC
...and 15 more sections

Key Result

Theorem 1

Figures (5)

Figure 1: Illustration of interval-based comparisons. Top: pairwise comparisons between a positive instance $I_1$ and a negative instance $I_0$. Bottom: comparisons between an interval prediction $I$ and a decision threshold $t$, illustrating confident decisions ($I > t$ or $I < t$) and ambiguity when the interval overlaps the threshold.
Figure 2: Geometric interpretation of interval-based ROC analysis. The blue curve plots $\mathrm{TPR}_L$ versus $\mathrm{FPR}_U$ (area $= \operatorname{AUC}_L$), and the red curve plots $\mathrm{TPR}_U$ versus $\mathrm{FPR}_L$ (area above $= 1-\operatorname{AUC}_U$). The three regions correspond to $\mathbb{P}(I_1 > I_0)$ (blue), $\mathbb{P}(\text{overlap})$ (white), and $\mathbb{P}(I_1 < I_0)$ (red). Curves are shown for a 90% confidence level from Experiment 1 in Section \ref{['sec:exp1']}.
Figure 3: Three-region Decomposition vs. Confidence level. The stacked areas show $\mathbb{P}(I_1 > I_0)$ (blue, confident correct rankings), $\mathbb{P}(\text{overlap})$ (gray, ambiguous rankings), and $\mathbb{P}(I_1 < I_0)$ (red, confident misorderings). At 0% confidence level, the decomposition reduces to the classical point-based case with no overlap. As the confidence level increases from 0% to 99%, probability mass shifts from ordered regions to the overlap region, while the total probability remains equal to 1.
Figure 4: Selective prediction metrics versus nominal confidence level. The abstention rate (AR, gray) increases with interval width, while the uncertainty-aware AUC ($uAUC$, blue) remains stable ($0.83$--$0.94$) and monotonically increasing.
Figure 5: Empirical validation of theoretical bounds on the optimal AUC ($AUC^*$). The shaded region denotes the theoretical range $[AUC_L - p_{\text{pair}}, AUC_U + p_{\text{pair}}]$, which strictly contains the optimal AUC across varying miscoverage rates $\alpha$.

Theorems & Definitions (9)

Theorem 1
proof
Theorem 2
proof
Corollary 1: Three-Region Decomposition
Proposition 1: Monotonicity of $uAUC$
Theorem 3: Bounds on the Optimal AUC
proof
proof

Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification

TL;DR

Abstract

Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (9)