Table of Contents
Fetching ...

Class conditional conformal prediction for multiple inputs by p-value aggregation

Jean-Baptiste Fermanian, Mohamed Hebiri, Joseph Salmon

TL;DR

This paper extends conformal prediction to settings where multiple observations of the same instance are available for classification, addressing the challenge of preserving class-conditional coverage while leveraging information from all inputs. It develops a rigorous p-value aggregation framework that relies on the exact joint distribution of conformal p-values, enabling efficient prediction-set construction via score-based envelopes (quantile, area, and distance-based). The authors introduce practical aggregation methods, including refined majority voting and several envelope-based scores, and validate them on synthetic mixtures and the LifeCLEF Plant Identification Task, showing improved informativeness (smaller sets) without sacrificing coverage. The work highlights the importance of exchangeability for multi-input aggregation, proposes randomized p-values to handle ties, and offers a pathway to scalable, class-conditional uncertainty quantification in citizen-science and similar multi-view classification scenarios.

Abstract

Conformal prediction methods are statistical tools designed to quantify uncertainty and generate predictive sets with guaranteed coverage probabilities. This work introduces an innovative refinement to these methods for classification tasks, specifically tailored for scenarios where multiple observations (multi-inputs) of a single instance are available at prediction time. Our approach is particularly motivated by applications in citizen science, where multiple images of the same plant or animal are captured by individuals. Our method integrates the information from each observation into conformal prediction, enabling a reduction in the size of the predicted label set while preserving the required class-conditional coverage guarantee. The approach is based on the aggregation of conformal p-values computed from each observation of a multi-input. By exploiting the exact distribution of these p-values, we propose a general aggregation framework using an abstract scoring function, encompassing many classical statistical tools. Knowledge of this distribution also enables refined versions of standard strategies, such as majority voting. We evaluate our method on simulated and real data, with a particular focus on Pl@ntNet, a prominent citizen science platform that facilitates the collection and identification of plant species through user-submitted images.

Class conditional conformal prediction for multiple inputs by p-value aggregation

TL;DR

This paper extends conformal prediction to settings where multiple observations of the same instance are available for classification, addressing the challenge of preserving class-conditional coverage while leveraging information from all inputs. It develops a rigorous p-value aggregation framework that relies on the exact joint distribution of conformal p-values, enabling efficient prediction-set construction via score-based envelopes (quantile, area, and distance-based). The authors introduce practical aggregation methods, including refined majority voting and several envelope-based scores, and validate them on synthetic mixtures and the LifeCLEF Plant Identification Task, showing improved informativeness (smaller sets) without sacrificing coverage. The work highlights the importance of exchangeability for multi-input aggregation, proposes randomized p-values to handle ties, and offers a pathway to scalable, class-conditional uncertainty quantification in citizen-science and similar multi-view classification scenarios.

Abstract

Conformal prediction methods are statistical tools designed to quantify uncertainty and generate predictive sets with guaranteed coverage probabilities. This work introduces an innovative refinement to these methods for classification tasks, specifically tailored for scenarios where multiple observations (multi-inputs) of a single instance are available at prediction time. Our approach is particularly motivated by applications in citizen science, where multiple images of the same plant or animal are captured by individuals. Our method integrates the information from each observation into conformal prediction, enabling a reduction in the size of the predicted label set while preserving the required class-conditional coverage guarantee. The approach is based on the aggregation of conformal p-values computed from each observation of a multi-input. By exploiting the exact distribution of these p-values, we propose a general aggregation framework using an abstract scoring function, encompassing many classical statistical tools. Knowledge of this distribution also enables refined versions of standard strategies, such as majority voting. We evaluate our method on simulated and real data, with a particular focus on Pl@ntNet, a prominent citizen science platform that facilitates the collection and identification of plant species through user-submitted images.

Paper Structure

This paper contains 42 sections, 10 theorems, 46 equations, 10 figures, 2 tables, 1 algorithm.

Key Result

Lemma 3.1

Let $n,m \in \mathbb{N}^*$ and $P \sim {\mathcal{U}}(A_{n,m})$, where ${\mathcal{U}}$ denotes the uniform distribution. Then

Figures (10)

  • Figure 1: Simulation of 150 samples of $P\sim {\mathcal{U}}(A_{n,m})$ for $m=10$ and $n=100$ (black lines). Quantile is the envelope associated to $v_Q$, and Majority to the majority vote.
  • Figure 2: Synthetic data: marginal coverage, average length and minimum of the class conditional coverages in function of the number of observations for $\alpha =0.1$, computed on $5000$ repetitions. The confidence region is $\pm$ (empirical std)/$\sqrt{5000}$.
  • Figure 3: LifeCLEF Plant Identification Task 2015: average length in log-scale for $\alpha =0.1$ and two different choice of temperature in function of the number of observations.
  • Figure 4: Distribution of the averaged softmax scores for synthetic data. $5000$ averaged scores are computed and sorted for $m\in \IfEqCase{a}{ {a}{\mathopen{}\mathclose{\left\{1,3,5,7,9\right\}}} {0}{\{1,3,5,7,9\}} {1}{\{1,3,5,7,9\}} {2}{\{1,3,5,7,9\}} {3}{\{1,3,5,7,9\}} {4}{\{1,3,5,7,9\}} }[]$.
  • Figure 5: Empirical coverage and average lengths for class-conditional approach using two naive calibration sets for synthetic data. The calibration set is of size $5000$, the coverage and the length are evaluated on $300$ multi-inputs repeated $1000$ times. The confidence region is $\pm$ (empirical std)/$\sqrt{5000}$
  • ...and 5 more figures

Theorems & Definitions (19)

  • Lemma 3.1
  • Remark 3.2
  • Remark 3.4
  • Theorem 3.5
  • Remark 3.6
  • Corollary 3.7
  • Proposition 4.1
  • Proposition 4.2
  • Remark 4.3
  • Proposition 5.1
  • ...and 9 more