Table of Contents
Fetching ...

Enhancing Conformal Prediction via Class Similarity

Ariel Fargion, Lahav Dabah, Tom Tirer

TL;DR

This paper proposes augmenting the CP score function with a term that penalizes predictions with out-of-group errors, and shows mathematically that this strategy can also reduce the average set size of any CP score function.

Abstract

Conformal Prediction (CP) has emerged as a powerful statistical framework for high-stakes classification applications. Instead of predicting a single class, CP generates a prediction set, guaranteed to include the true label with a pre-specified probability. The performance of different CP methods is typically assessed by their average prediction set size. In setups where the classes can be partitioned into semantic groups, e.g., diseases that require similar treatment, users can benefit from prediction sets that are not only small on average, but also contain a small number of semantically different groups. This paper begins by addressing this problem and ultimately offers a widely applicable tool for boosting any CP method on any dataset. First, given a class partition, we propose augmenting the CP score function with a term that penalizes predictions with out-of-group errors. We theoretically analyze this strategy and prove its advantages for group-related metrics. Surprisingly, we show mathematically that, for common class partitions, it can also reduce the average set size of any CP score function. Our analysis reveals the class similarity factors behind this improvement and motivates us to propose a model-specific variant, which does not require any human semantic partition and can further reduce the prediction set size. Finally, we present an extensive empirical study, encompassing prominent CP methods, multiple models, and several datasets, which demonstrates that our class-similarity-based approach consistently enhances CP methods.

Enhancing Conformal Prediction via Class Similarity

TL;DR

This paper proposes augmenting the CP score function with a term that penalizes predictions with out-of-group errors, and shows mathematically that this strategy can also reduce the average set size of any CP score function.

Abstract

Conformal Prediction (CP) has emerged as a powerful statistical framework for high-stakes classification applications. Instead of predicting a single class, CP generates a prediction set, guaranteed to include the true label with a pre-specified probability. The performance of different CP methods is typically assessed by their average prediction set size. In setups where the classes can be partitioned into semantic groups, e.g., diseases that require similar treatment, users can benefit from prediction sets that are not only small on average, but also contain a small number of semantically different groups. This paper begins by addressing this problem and ultimately offers a widely applicable tool for boosting any CP method on any dataset. First, given a class partition, we propose augmenting the CP score function with a term that penalizes predictions with out-of-group errors. We theoretically analyze this strategy and prove its advantages for group-related metrics. Surprisingly, we show mathematically that, for common class partitions, it can also reduce the average set size of any CP score function. Our analysis reveals the class similarity factors behind this improvement and motivates us to propose a model-specific variant, which does not require any human semantic partition and can further reduce the prediction set size. Finally, we present an extensive empirical study, encompassing prominent CP methods, multiple models, and several datasets, which demonstrates that our class-similarity-based approach consistently enhances CP methods.

Paper Structure

This paper contains 23 sections, 5 theorems, 28 equations, 5 figures, 8 tables.

Key Result

Theorem 3.1

Suppose that $\{\left(X_i, Y_i\right)\}_{i=1}^n$ and $(X_{n+1},Y_{n+1})$ are i.i.d., and define $\hat{q}$ as in step 2 above and $\mathcal{C}_\alpha(X_{n+1})$ as in step 3 above. Then, $\mathbb{P}\left(Y_{n+1} \in \mathcal{C}(X_{n+1})\right) \geq 1 - \alpha$.

Figures (5)

  • Figure 1: Illustration of prediction sets for an example before and after applying our proposed regularization. Each circle corresponds to a class, with colors indicating superclasses. Filled circles denote classes included in the prediction set, and circle size reflects the softmax value. In this example, the prediction set size decreases from 4 to 3, and the number of superclasses represented decreases from 3 to 2. We show that our regularization reduces the average prediction set size, regardless of the baseline score function.
  • Figure 2: The effect of $\lambda$ on the average set size (blue) and number of superclasses (red) for CIFAR-100, ResNet50 and RAPS score.
  • Figure 3: Comparison of zoomed 10×10 regions of the model specific (left) and model agnostic (right) similarity matrices.
  • Figure 4: Average set size (blue) and worst class conditional coverage deviation (red) over hyperparamter $\lambda$ in RAPS method Model Specific with model CIFAR-100 Resnet 50. Average set size is reaching a minimum along with $\lambda$ while the class conditinal remains stabilized.
  • Figure 5: Comparison of the ResNet50 model-specific similarity matrix and the original superclass matrix of CIFAR-100.

Theorems & Definitions (10)

  • Theorem 3.1: Theorem 1 in angelopoulos2021gentle
  • Lemma 4.1
  • proof
  • Proposition 4.2
  • proof
  • Corollary 4.3
  • proof
  • Definition 4.4
  • Theorem 4.5
  • proof