Table of Contents
Fetching ...

Conformal Prediction for Long-Tailed Classification

Tiffany Ding, Jean-Baptiste Fermanian, Joseph Salmon

TL;DR

This paper tackles the challenge of uncertainty quantification in long-tailed multi-class classification by designing conformal prediction procedures that guarantee marginal coverage while balancing set size and class-conditional coverage. It introduces a macro-coverage–oriented score, PAS, and its weighted variant WPAS, combined with Standard CP to approximate optimal macro-coverage with small prediction sets; it also proposes Interp-Q to smoothly interpolate between classwise and marginal CP. The methods are evaluated on Pl@ntNet-300K and iNaturalist-2018, showing that PAS achieves Pareto-optimal trade-offs and that Interp-Q provides tunable control over the size-coverage balance, with WPAS effectively boosting tail-class coverage when desired. Overall, the approach enables practical, scalable uncertainty quantification for long-tailed domains such as biodiversity identification and rare-event detection, while preserving marginal guarantees and offering flexible control over the coverage-quality trade-off.

Abstract

Many real-world classification problems, such as plant identification, have extremely long-tailed class distributions. In order for prediction sets to be useful in such settings, they should (i) provide good class-conditional coverage, ensuring that rare classes are not systematically omitted from the prediction sets, and (ii) be a reasonable size, allowing users to easily verify candidate labels. Unfortunately, existing conformal prediction methods, when applied to the long-tailed setting, force practitioners to make a binary choice between small sets with poor class-conditional coverage or sets with very good class-conditional coverage but that are extremely large. We propose methods with guaranteed marginal coverage that smoothly trade off between set size and class-conditional coverage. First, we introduce a new conformal score function called prevalence-adjusted softmax that targets macro-coverage, a relaxed notion of class-conditional coverage. Second, we propose a new procedure that interpolates between marginal and class-conditional conformal prediction by linearly interpolating their conformal score thresholds. We demonstrate our methods on Pl@ntNet-300K and iNaturalist-2018, two long-tailed image datasets with 1,081 and 8,142 classes, respectively.

Conformal Prediction for Long-Tailed Classification

TL;DR

This paper tackles the challenge of uncertainty quantification in long-tailed multi-class classification by designing conformal prediction procedures that guarantee marginal coverage while balancing set size and class-conditional coverage. It introduces a macro-coverage–oriented score, PAS, and its weighted variant WPAS, combined with Standard CP to approximate optimal macro-coverage with small prediction sets; it also proposes Interp-Q to smoothly interpolate between classwise and marginal CP. The methods are evaluated on Pl@ntNet-300K and iNaturalist-2018, showing that PAS achieves Pareto-optimal trade-offs and that Interp-Q provides tunable control over the size-coverage balance, with WPAS effectively boosting tail-class coverage when desired. Overall, the approach enables practical, scalable uncertainty quantification for long-tailed domains such as biodiversity identification and rare-event detection, while preserving marginal guarantees and offering flexible control over the coverage-quality trade-off.

Abstract

Many real-world classification problems, such as plant identification, have extremely long-tailed class distributions. In order for prediction sets to be useful in such settings, they should (i) provide good class-conditional coverage, ensuring that rare classes are not systematically omitted from the prediction sets, and (ii) be a reasonable size, allowing users to easily verify candidate labels. Unfortunately, existing conformal prediction methods, when applied to the long-tailed setting, force practitioners to make a binary choice between small sets with poor class-conditional coverage or sets with very good class-conditional coverage but that are extremely large. We propose methods with guaranteed marginal coverage that smoothly trade off between set size and class-conditional coverage. First, we introduce a new conformal score function called prevalence-adjusted softmax that targets macro-coverage, a relaxed notion of class-conditional coverage. Second, we propose a new procedure that interpolates between marginal and class-conditional conformal prediction by linearly interpolating their conformal score thresholds. We demonstrate our methods on Pl@ntNet-300K and iNaturalist-2018, two long-tailed image datasets with 1,081 and 8,142 classes, respectively.

Paper Structure

This paper contains 46 sections, 9 theorems, 65 equations, 14 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

The solutions of eq:min_st_macrocoverage and eq:max_macrocoverage are of the form for some threshold $t$ that depends on $\alpha$ or $\kappa$, respectively.

Figures (14)

  • Figure 1: The number of train examples of each species in Pl@ntNet-300K garcin2021pl. We highlight threatened species, as defined by the International Union for Conservation of Nature (https://iucn.org), which are particularly important to identify for biodiversity monitoring purposes. Note that most of these species are in the tail of the distribution.
  • Figure 2: Class distributions (sorted by prevalence), plotted using a logarithmic scale, of the classical train, val, and test sets in the datasets we experiment on. We further randomly split 30% of val to use for model validation and use the remaining 70% as the calibration set ${\mathcal{D}}_{\mathrm{cal}}$. We use the truncated versions when it is important to have good estimates of class-conditional metrics.
  • Figure 3: Average set size vs. FracBelow50%, UnderCovGap, MacroCov, and MarginalCov for various methods on the two datasets. For $\textsc{Interp-Q}$, lines are used to trace out the trade-off curve achieved by running the method with different $\tau$ values for a fixed $\alpha$. For FracBelow50% and UnderCovGap, it is better to be closer to the bottom left. For MacroCov, the bottom right is better. For MarginalCov, we want to be to the right of the dotted line at $1-\alpha$ for the $\alpha$ at which the method is run.
  • Figure 4: Results for running $\textsc{Standard}$ on Pl@ntNet-300K with different conformal score functions: $\mathsf{softmax}$, $\mathsf{PAS}$, and $\mathsf{WPAS}$ with $\lambda \in \{1,10,10^2,10^3\}$. Increasing $\lambda$ in $\mathsf{WPAS}$ improves the class-conditional coverage of at-risk classes, which is measured using $\hat{c}_y$, the empirical class-conditional coverage of class $y$. "At-risk average $\hat{c}_y$" is computed as $(1/|{\mathcal{Y}}_{\text{at-risk}}|)\sum_{y \in {\mathcal{Y}}_{\text{at-risk}}} \hat{c}_y$ and "not-at-risk average $\hat{c}_y$" is computed analogously. Note that here the y-axis is on a linear scale.
  • Figure 5: Class-conditional decision accuracies for a range of decision makers when presented with sets from $\textsc{Standard}$, $\textsc{Classwise}$ or $\textsc{Standard}$ with $\mathsf{PAS}$ at $\alpha=0.1$. Classes are ordered by decreasing decision accuracy of $H_{\mathrm{expert}}$ under each method.
  • ...and 9 more figures

Theorems & Definitions (17)

  • Proposition 1: Informal
  • Proposition 2: Informal
  • Proposition 3
  • Proposition 4: Informal
  • Proposition 5
  • Proposition 6: Formal version of \ref{['prop:macro_weighted_informal']}
  • Remark 1
  • proof
  • Lemma B.1: neyman1933ix
  • Remark 2
  • ...and 7 more