Table of Contents
Fetching ...

Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

Yuanjie Shi, Subhankar Ghosh, Taha Belkhouja, Janardhan Rao Doppa, Yan Yan

TL;DR

This work addresses the problem of achieving reliable class-wise (per-class) coverage in conformal prediction for multi-class, often imbalanced tasks. It introduces RC3P, which augments standard conformity-score calibration with label-rank calibration to selectively threshold only reliably-ranked classes, ensuring class-conditional coverage regardless of distribution or model. The authors prove the validity of RC3P's coverage and derive mild conditions under which it yields smaller prediction sets than baseline CCP; they also provide practical guidance for parameter choices to maximize efficiency. Extensive experiments on CIFAR-10/100, mini-ImageNet, and Food-101 show RC3P achieving consistent class-wise coverage with sizable reductions in average prediction set size (e.g., around 26% on average across datasets), highlighting its practical impact for uncertainty quantification in imbalanced settings.

Abstract

Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerbated for the setting of class-conditional coverage on imbalanced classification tasks with many and/or imbalanced classes. This paper proposes the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the prediction set sizes to achieve class-conditional coverage, where the valid coverage holds for each class. In contrast to the standard class-conditional CP (CCP) method that uniformly thresholds the class-wise conformity score for each class, the augmented label rank calibration step allows RC3P to selectively iterate this class-wise thresholding subroutine only for a subset of classes whose class-wise top-k error is small. We prove that agnostic to the classifier and data distribution, RC3P achieves class-wise coverage. We also show that RC3P reduces the size of prediction sets compared to the CCP method. Comprehensive experiments on multiple real-world datasets demonstrate that RC3P achieves class-wise coverage and 26.25% reduction in prediction set sizes on average.

Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

TL;DR

This work addresses the problem of achieving reliable class-wise (per-class) coverage in conformal prediction for multi-class, often imbalanced tasks. It introduces RC3P, which augments standard conformity-score calibration with label-rank calibration to selectively threshold only reliably-ranked classes, ensuring class-conditional coverage regardless of distribution or model. The authors prove the validity of RC3P's coverage and derive mild conditions under which it yields smaller prediction sets than baseline CCP; they also provide practical guidance for parameter choices to maximize efficiency. Extensive experiments on CIFAR-10/100, mini-ImageNet, and Food-101 show RC3P achieving consistent class-wise coverage with sizable reductions in average prediction set size (e.g., around 26% on average across datasets), highlighting its practical impact for uncertainty quantification in imbalanced settings.

Abstract

Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerbated for the setting of class-conditional coverage on imbalanced classification tasks with many and/or imbalanced classes. This paper proposes the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the prediction set sizes to achieve class-conditional coverage, where the valid coverage holds for each class. In contrast to the standard class-conditional CP (CCP) method that uniformly thresholds the class-wise conformity score for each class, the augmented label rank calibration step allows RC3P to selectively iterate this class-wise thresholding subroutine only for a subset of classes whose class-wise top-k error is small. We prove that agnostic to the classifier and data distribution, RC3P achieves class-wise coverage. We also show that RC3P reduces the size of prediction sets compared to the CCP method. Comprehensive experiments on multiple real-world datasets demonstrate that RC3P achieves class-wise coverage and 26.25% reduction in prediction set sizes on average.
Paper Structure (26 sections, 6 theorems, 29 equations, 28 figures, 24 tables, 1 algorithm)

This paper contains 26 sections, 6 theorems, 29 equations, 28 figures, 24 tables, 1 algorithm.

Key Result

Theorem 4.1

(Class-conditional coverage of RC3P) Suppose that selecting $\widehat{k}(y)$ values result in the class-wise top-$k$ error $\epsilon_y^{\widehat{k}(y)}$ for each class $y \in \mathcal{Y}$. For a target class-conditional coverage $1-\alpha$, if we set $\widehat{\alpha}_y$ and $\widehat{k}(y)$ in RC3P then RC3P can achieve the class-conditional coverage for every $y \in \mathcal{Y}$:

Figures (28)

  • Figure 1: Class-conditional coverage (Top row) and prediction set size (Bottom row) achieved by CCP, Cluster-CP, and RC3P methods when $\alpha = 0.1$ and models are trained with $200$ epochs on four imbalanced datasets with imbalance type EXP $\rho=0.1$. We clarify that RC3P overlaps with CCP on CIFAR-10. It is clear that RC3P has more densely distributed class-conditional coverage above $0.9$ (the target $1-\alpha$ class-conditional coverage) than CCP and Cluster-CP with significantly smaller prediction sets on CIFAR-100, mini-ImageNet and Food-101.
  • Figure 2: Visualization for the normalized frequency distribution of label ranks included in the prediction set of CCP, Cluster-CP, and RC3P with $\rho=0.1$ for imbalance type EXP when $\alpha = 0.1$ and models are trained with $200$ epochs. It is clear that the distribution of normalized frequency generated by RC3P tends to be lower compared to those produced by CCP and Cluster-CP. Furthermore, the probability density function tail for label ranks in the RC3P prediction set is notably shorter than that of other methods.
  • Figure 3: Verification of condition numbers $\{\sigma_y\}_{y=1}^K$ in Equation \ref{['eq:sigma_y_defination']} with imbalance type EXP, $\rho=0.1$ when $\alpha = 0.1$ and models are trained with $200$ epochs. Vertical dashed lines represent the value $1$, and we observe that all the condition numbers are smaller than $1$. This verifies the validity of the condition for Lemma \ref{['lemma:RC3P_improved_efficiency']}, and thus confirms that RC3P produces smaller prediction sets than CCP using calibration on both non-conformity scores and label ranks.
  • Figure 4: Illustrative examples of the different imbalanced distributions of the number of training examples per class index $c$ on CIFAR-100
  • Figure 5: Class-conditional coverage (Top row) and prediction set size (Bottom row) achieved by CCP, Cluster-CP, and RC3P methods when $\alpha = 0.1$ on CIFAR-10, CIFAR-100, mini-ImageNet, and Food-101 datasets with imbalance type EXP for imbalance ratio $\rho=0.5$. We clarify that RC3P overlaps with CCP on CIFAR-10. It is clear that RC3P has more densely distributed class-conditional coverage above $0.9$ (the target $1-\alpha$ class-conditional coverage) than CCP and Cluster-CP with significantly smaller prediction sets on CIFAR-100, mini-ImageNet and Food-101.
  • ...and 23 more figures

Theorems & Definitions (9)

  • Theorem 4.1
  • Lemma 4.2
  • Theorem 4.3
  • Theorem A.1
  • proof
  • Theorem A.2
  • proof
  • Theorem A.3
  • proof