Table of Contents
Fetching ...

Backward Conformal Prediction

Etienne Gauthier, Francis Bach, Michael I. Jordan

TL;DR

Backward Conformal Prediction (BCP) addresses the fixed-coverage limitation of standard conformal prediction by enforcing a data-driven size constraint on prediction sets via $\mathcal{T}$ while preserving marginal coverage guarantees through e-values. The framework pairs conformal e-prediction with an adaptive miscoverage level $\tilde{\alpha}$ and introduces a leave-one-out estimator $\hat{\alpha}^{\rm LOO}$ to estimate $\mathbb{E}[\tilde{\alpha}]$ from calibration data, making the guarantees practically computable. Theoretical results show $|\hat{\alpha}^{\rm LOO} - \mathbb{E}[\tilde{\alpha}]| = O_P(1/\sqrt{n})$ under mild conditions, with experiments on CIFAR-10 and a medical dataset illustrating effective size control and reliable coverage. This approach offers a flexible, interpretable uncertainty quantification tool for high-stakes domains where small, informative prediction sets are crucial.

Abstract

We introduce $\textit{Backward Conformal Prediction}$, a method that guarantees conformal coverage while providing flexible control over the size of prediction sets. Unlike standard conformal prediction, which fixes the coverage level and allows the conformal set size to vary, our approach defines a rule that constrains how prediction set sizes behave based on the observed data, and adapts the coverage level accordingly. Our method builds on two key foundations: (i) recent results by Gauthier et al. [2025] on post-hoc validity using e-values, which ensure marginal coverage of the form $\mathbb{P}(Y_{\rm test} \in \hat C_n^{\tildeα}(X_{\rm test})) \ge 1 - \mathbb{E}[\tildeα]$ up to a first-order Taylor approximation for any data-dependent miscoverage $\tildeα$, and (ii) a novel leave-one-out estimator $\hatα^{\rm LOO}$ of the marginal miscoverage $\mathbb{E}[\tildeα]$ based on the calibration set, ensuring that the theoretical guarantees remain computable in practice. This approach is particularly useful in applications where large prediction sets are impractical such as medical diagnosis. We provide theoretical results and empirical evidence supporting the validity of our method, demonstrating that it maintains computable coverage guarantees while ensuring interpretable, well-controlled prediction set sizes.

Backward Conformal Prediction

TL;DR

Backward Conformal Prediction (BCP) addresses the fixed-coverage limitation of standard conformal prediction by enforcing a data-driven size constraint on prediction sets via while preserving marginal coverage guarantees through e-values. The framework pairs conformal e-prediction with an adaptive miscoverage level and introduces a leave-one-out estimator to estimate from calibration data, making the guarantees practically computable. Theoretical results show under mild conditions, with experiments on CIFAR-10 and a medical dataset illustrating effective size control and reliable coverage. This approach offers a flexible, interpretable uncertainty quantification tool for high-stakes domains where small, informative prediction sets are crucial.

Abstract

We introduce , a method that guarantees conformal coverage while providing flexible control over the size of prediction sets. Unlike standard conformal prediction, which fixes the coverage level and allows the conformal set size to vary, our approach defines a rule that constrains how prediction set sizes behave based on the observed data, and adapts the coverage level accordingly. Our method builds on two key foundations: (i) recent results by Gauthier et al. [2025] on post-hoc validity using e-values, which ensure marginal coverage of the form up to a first-order Taylor approximation for any data-dependent miscoverage , and (ii) a novel leave-one-out estimator of the marginal miscoverage based on the calibration set, ensuring that the theoretical guarantees remain computable in practice. This approach is particularly useful in applications where large prediction sets are impractical such as medical diagnosis. We provide theoretical results and empirical evidence supporting the validity of our method, demonstrating that it maintains computable coverage guarantees while ensuring interpretable, well-controlled prediction set sizes.

Paper Structure

This paper contains 22 sections, 12 theorems, 115 equations, 15 figures, 1 algorithm.

Key Result

Proposition 2.2

Consider a calibration set $\{(X_i,Y_i)\}_{i=1}^n$ and a test data point $(X_{\rm test},Y_{\rm test})$ such that $(X_1,Y_1),\dotsc,(X_n,Y_n),(X_{\rm test},Y_{\rm test})$ are exchangeable. Let $\tilde{\alpha} > 0$ be any miscoverage level that may depend on this data. Then we have that: where

Figures (15)

  • Figure 1: Overview of Backward Conformal Prediction. The procedure first fixes a (potentially data-dependent) size constraint rule $\mathcal{T}$, then constructs a conformal set $\hat{C}_n^{\tilde{\alpha}}(X_{\rm test})$ using an adaptive miscoverage level $\tilde{\alpha}$ chosen to respect the size constraint. A leave-one-out estimator $\hat{\alpha}^{\rm LOO}$ is computed on the calibration set to estimate the marginal miscoverage $\mathbb{E}[\tilde{\alpha}]$, enabling practitioners to decide whether to trust or reject the resulting conformal set based on the estimated coverage.
  • Figure 2: The left panel illustrates the definition of the true marginal miscoverage $\mathbb{E}[\tilde{\alpha}]$, which depends on the ratio between the test score $S(X_{\rm test},.)$ and the average of all $n{+}1$ scores. The right panel depicts the leave-one-out estimator $\hat{\alpha}^{\rm LOO}$: each calibration point $j$ yields a pseudo-miscoverage $\tilde{\alpha}_j$ by comparing $S(X_j,.)$ to the average calibration score. Averaging these gives $\hat{\alpha}^{\rm LOO}$, which approximates $\mathbb{E}[\tilde{\alpha}]$ without using the test score. Feature-label pairs are denoted $Z_i := (X_i, Y_i)$.
  • Figure 3: Histograms of $1 - \tilde{\alpha}$ and $1 - \hat{\alpha}^{\rm LOO}$ from $N = 200$ runs for various $(n, \mathcal{T})$ configurations. The red dashed line shows the empirical coverage probability. See text for details.
  • Figure 4: Sample histogram 1 of the values $\tilde{\alpha}_j$, for $j = 1, \dots, n$, used to compute $\hat{\alpha}^{\rm LOO}$ with $\mathcal{T}=1$ and $n = 5000$.
  • Figure 5: Sample histogram 2 of the values $\tilde{\alpha}_j$, for $j = 1, \dots, n$, used to compute $\hat{\alpha}^{\rm LOO}$ with $\mathcal{T}=1$ and $n = 5000$.
  • ...and 10 more figures

Theorems & Definitions (22)

  • Definition 1.1: Size constraint rule
  • Definition 2.1: E-variable
  • Proposition 2.2: gauthier2025evaluesexpandscopeconformal
  • Theorem 3.1
  • Remark 3.2
  • Theorem 3.3
  • Remark 3.4
  • Theorem 3.5
  • Lemma C.1
  • proof
  • ...and 12 more