Table of Contents
Fetching ...

Conformal Prediction Sets with Improved Conditional Coverage using Trust Scores

Jivat Neet Kaur, Michael I. Jordan, Ahmed Alaa

TL;DR

This paper tackles the limitation of standard conformal prediction, which provides only marginal coverage, by aiming for approximate conditional coverage on a reduced set of variables rather than the full input. It introduces a practical conformal prediction variant that conditions on a low-dimensional statistic V consisting of classifier confidence and a nonparametric trust score, enabling targeted improvement where miscoverage is most problematic due to overconfident incorrect predictions. The method learns a threshold via a finite-dimensional function class over Conf and Trust using quantile regression, yielding prediction sets that satisfy $P(Y \in {\mathcal C}(X)) \ge 1-\alpha$ conditioned on V, with a randomized version capable of exact conditional guarantees. Empirically, the approach improves conditional coverage across multiple large-scale image datasets (ImageNet, Places365 and their long-tail variants) and a dermatology dataset (Fitzpatrick 17k), reducing CovGap across conf/trust and conf/rank bins and enhancing class-conditional and subgroup coverage without large increases in set size. The work offers a practical path toward more actionable and fair uncertainty quantification in high-stakes classification tasks by focusing on regions of the input space where miscoverage is most consequential.

Abstract

Standard conformal prediction offers a marginal guarantee on coverage, but for prediction sets to be truly useful, they should ideally ensure coverage conditional on each test point. Unfortunately, it is impossible to achieve exact, distribution-free conditional coverage in finite samples. In this work, we propose an alternative conformal prediction algorithm that targets coverage where it matters most--in instances where a classifier is overconfident in its incorrect predictions. We start by dissecting miscoverage events in marginally-valid conformal prediction, and show that miscoverage rates vary based on the classifier's confidence and its deviation from the Bayes optimal classifier. Motivated by this insight, we develop a variant of conformal prediction that targets coverage conditional on a reduced set of two variables: the classifier's confidence in a prediction and a nonparametric trust score that measures its deviation from the Bayes classifier. Empirical evaluation on multiple image datasets shows that our method generally improves conditional coverage properties compared to standard conformal prediction, including class-conditional coverage, coverage over arbitrary subgroups, and coverage over demographic groups.

Conformal Prediction Sets with Improved Conditional Coverage using Trust Scores

TL;DR

This paper tackles the limitation of standard conformal prediction, which provides only marginal coverage, by aiming for approximate conditional coverage on a reduced set of variables rather than the full input. It introduces a practical conformal prediction variant that conditions on a low-dimensional statistic V consisting of classifier confidence and a nonparametric trust score, enabling targeted improvement where miscoverage is most problematic due to overconfident incorrect predictions. The method learns a threshold via a finite-dimensional function class over Conf and Trust using quantile regression, yielding prediction sets that satisfy conditioned on V, with a randomized version capable of exact conditional guarantees. Empirically, the approach improves conditional coverage across multiple large-scale image datasets (ImageNet, Places365 and their long-tail variants) and a dermatology dataset (Fitzpatrick 17k), reducing CovGap across conf/trust and conf/rank bins and enhancing class-conditional and subgroup coverage without large increases in set size. The work offers a practical path toward more actionable and fair uncertainty quantification in high-stakes classification tasks by focusing on regions of the input space where miscoverage is most consequential.

Abstract

Standard conformal prediction offers a marginal guarantee on coverage, but for prediction sets to be truly useful, they should ideally ensure coverage conditional on each test point. Unfortunately, it is impossible to achieve exact, distribution-free conditional coverage in finite samples. In this work, we propose an alternative conformal prediction algorithm that targets coverage where it matters most--in instances where a classifier is overconfident in its incorrect predictions. We start by dissecting miscoverage events in marginally-valid conformal prediction, and show that miscoverage rates vary based on the classifier's confidence and its deviation from the Bayes optimal classifier. Motivated by this insight, we develop a variant of conformal prediction that targets coverage conditional on a reduced set of two variables: the classifier's confidence in a prediction and a nonparametric trust score that measures its deviation from the Bayes classifier. Empirical evaluation on multiple image datasets shows that our method generally improves conditional coverage properties compared to standard conformal prediction, including class-conditional coverage, coverage over arbitrary subgroups, and coverage over demographic groups.
Paper Structure (39 sections, 3 theorems, 20 equations, 7 figures, 5 tables)

This paper contains 39 sections, 3 theorems, 20 equations, 7 figures, 5 tables.

Key Result

Theorem 1

Suppose $\{(X_i, Y_i)\}_{i=1}^{n+1}$ are independent and identically distributed. Let $\mathcal{F} = \{\Phi(\cdot)^\top\beta : \beta \in {\mathbb{R}}^d\}$ denote the class of linear functions over the basis $\Phi: \mathcal{X} \to \mathbb{R}^d$. If the distribution of $s \mid X$ is continuous, then f

Figures (7)

  • Figure 1: Miscoverage patterns in standard conformal prediction. Illustration of conditional coverage of standard conformal prediction ($\textsc{standard}$) over regions of the feature space binned by model confidence and rank of the true class (top) ($\hbox{Conf} \times \hbox{Rank}$), and confidence and trust score (bottom) ($\hbox{Conf} \times \hbox{Trust}$). We set $\alpha = 0.1$; hence, red bins indicate undercoverage (coverage $<$ 0.9) and green bins indicate overcoverage (coverage $>$ 0.9). We split samples into equal-size bins based on rank and trust score. As a special case for ImageNet, we manually define the rank bins, as $\sim$75% test samples have accurate predictions. Models are calibrated using temperature scaling.
  • Figure 2: Average coverage gap between randomly sampled Euclidean balls of fixed radius. We vary the radius on x-axis. Standard errors are reported by error bars.
  • Figure 3: Conditional coverage of $\textsc{standard}$ (top) and $\textsc{conditional}$ (bottom) over regions of the feature space binned by model confidence and trust score ($\hbox{Conf} \times \hbox{Trust}$).
  • Figure 4: Conditional coverage of $\textsc{standard}$ (top) and $\textsc{conditional}$ (bottom) over regions of the feature space binned by model confidence and rank of the true class ($\hbox{Conf} \times \hbox{Rank}$).
  • Figure 5: Effect of polynomial degree $d$ on CovGap and Size.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1: Theorem 2 gibbs2023conformal
  • Corollary 1
  • Theorem 2: Proposition 4 in gibbs2023conformal