Table of Contents
Fetching ...

Trustworthy Classification through Rank-Based Conformal Prediction Sets

Rui Luo, Zhixin Zhou

TL;DR

This work tackles uncertainty quantification for multiclass classification by proposing RANK, a rank-based conformal prediction method that constructs prediction sets with guaranteed coverage $1-\alpha$ without requiring well-calibrated probabilities. It develops a rank-based conformity score, provides a theoretical analysis linking the expected set size to the rank distribution and validates the approach with extensive experiments across image and NLP tasks. Results show that RANK often achieves the target coverage with smaller prediction sets than THR and APS baselines, improving reliability and efficiency of uncertainty quantification. This work advances practical deployment of ML systems by enabling calibration-free, distribution-free uncertainty quantification for modern classifiers.

Abstract

Machine learning classification tasks often benefit from predicting a set of possible labels with confidence scores to capture uncertainty. However, existing methods struggle with the high-dimensional nature of the data and the lack of well-calibrated probabilities from modern classification models. We propose a novel conformal prediction method that employs a rank-based score function suitable for classification models that predict the order of labels correctly, even if not well-calibrated. Our approach constructs prediction sets that achieve the desired coverage rate while managing their size. We provide a theoretical analysis of the expected size of the conformal prediction sets based on the rank distribution of the underlying classifier. Through extensive experiments, we demonstrate that our method outperforms existing techniques on various datasets, providing reliable uncertainty quantification. Our contributions include a novel conformal prediction method, theoretical analysis, and empirical evaluation. This work advances the practical deployment of machine learning systems by enabling reliable uncertainty quantification.

Trustworthy Classification through Rank-Based Conformal Prediction Sets

TL;DR

This work tackles uncertainty quantification for multiclass classification by proposing RANK, a rank-based conformal prediction method that constructs prediction sets with guaranteed coverage without requiring well-calibrated probabilities. It develops a rank-based conformity score, provides a theoretical analysis linking the expected set size to the rank distribution and validates the approach with extensive experiments across image and NLP tasks. Results show that RANK often achieves the target coverage with smaller prediction sets than THR and APS baselines, improving reliability and efficiency of uncertainty quantification. This work advances practical deployment of ML systems by enabling calibration-free, distribution-free uncertainty quantification for modern classifiers.

Abstract

Machine learning classification tasks often benefit from predicting a set of possible labels with confidence scores to capture uncertainty. However, existing methods struggle with the high-dimensional nature of the data and the lack of well-calibrated probabilities from modern classification models. We propose a novel conformal prediction method that employs a rank-based score function suitable for classification models that predict the order of labels correctly, even if not well-calibrated. Our approach constructs prediction sets that achieve the desired coverage rate while managing their size. We provide a theoretical analysis of the expected size of the conformal prediction sets based on the rank distribution of the underlying classifier. Through extensive experiments, we demonstrate that our method outperforms existing techniques on various datasets, providing reliable uncertainty quantification. Our contributions include a novel conformal prediction method, theoretical analysis, and empirical evaluation. This work advances the practical deployment of machine learning systems by enabling reliable uncertainty quantification.
Paper Structure (11 sections, 2 theorems, 11 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 2 theorems, 11 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

The output $\widehat{C}_\alpha(x_{n+1})$ from Algorithm alg:rank satisfies $\widehat{C}_\alpha(x_{n+1})\subset\{\widehat{y}_{(1)}, \dots, \widehat{y}_{(r^*_\alpha)}\}$, i.e., the subset of labels that have top-$r^*_\alpha$ values in $\{\widehat{\pi}_1(x_{n+1}), \dots, \widehat{\pi}_K(x_{n+1})\}$.

Figures (5)

  • Figure 1: The figure illustrates the construction of a 90% prediction set for a test sample with sorted probability vector $[0.55, 0.2, \textcolor{red}{0.15}, 0.1, 0, 0, 0, 0, 0, 0]$. The top three classes are included based on both the probability of ranks (Left) and the distribution of the 3rd largest probabilities in the calibration set (Right).
  • Figure 2: Rank distribution plots of the true class ranks for different datasets. The vertical red line indicates the rank threshold where the cumulative probability exceeds 0.90, corresponding to $r^*_\alpha$ from Algorithm \ref{['alg:rank']}. This value aligns with the average prediction set size for $\alpha=0.1$ in Table \ref{['tab:alpha_0.1_results']}. For CIFAR-100 (cifar100) and 20 Newsgroup (news20), we plot only the rank distribution for the top 10 ranks.
  • Figure 3: Results for Image Classification on MNIST (mnist), Fashion-MNIST (fmnist), CIFAR-10 (cifar10), and CIFAR-100 (cifar100).
  • Figure 4: Results for the Multi-Choice Question-Answering datasets.
  • Figure 5: Results for Topic Classification on AG News (agnews), 20 Newsgroups (news20); and Emotion Recognition on CARER (carer), TweetEval (tweet).

Theorems & Definitions (4)

  • Proposition 1
  • Definition 1: Exchangeability
  • Theorem 4.1
  • proof