Table of Contents
Fetching ...

Confidence on the Focal: Conformal Prediction with Selection-Conditional Coverage

Ying Jin, Zhimei Ren

TL;DR

This work addresses uncertainty quantification for focal units in conformal prediction by introducing JOMI, a reference-set based framework that achieves selection-conditional coverage under data-driven selection. By swapping calibration and test units, JOMI constructs a reference set whose scores remain exchangeable conditional on the selection, enabling calibrated prediction sets for multiple test units. The method accommodates a broad class of selection rules, including top-K, p-value thresholds, and preliminary conformal-based selections, with computationally efficient implementations and solid theoretical guarantees. Empirical results in drug discovery and health risk prediction illustrate that marginal conformal intervals can under- or over-cover the selected units, while JOMI maintains the promised coverage and adapts interval lengths to the selection context.

Abstract

Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn test point with a prescribed probability. However, in practice, data-driven methods are often used to identify specific test unit(s) of interest, requiring uncertainty quantification tailored to these focal units. In such cases, marginally valid conformal prediction intervals may fail to provide valid coverage for the focal unit(s) due to selection bias. This paper presents a general framework for constructing a prediction set with finite-sample exact coverage, conditional on the unit being selected by a given procedure. The general form of our method accommodates arbitrary selection rules that are invariant to the permutation of the calibration units, and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We also work out computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction.

Confidence on the Focal: Conformal Prediction with Selection-Conditional Coverage

TL;DR

This work addresses uncertainty quantification for focal units in conformal prediction by introducing JOMI, a reference-set based framework that achieves selection-conditional coverage under data-driven selection. By swapping calibration and test units, JOMI constructs a reference set whose scores remain exchangeable conditional on the selection, enabling calibrated prediction sets for multiple test units. The method accommodates a broad class of selection rules, including top-K, p-value thresholds, and preliminary conformal-based selections, with computationally efficient implementations and solid theoretical guarantees. Empirical results in drug discovery and health risk prediction illustrate that marginal conformal intervals can under- or over-cover the selected units, while JOMI maintains the promised coverage and adapts interval lengths to the selection context.

Abstract

Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn test point with a prescribed probability. However, in practice, data-driven methods are often used to identify specific test unit(s) of interest, requiring uncertainty quantification tailored to these focal units. In such cases, marginally valid conformal prediction intervals may fail to provide valid coverage for the focal unit(s) due to selection bias. This paper presents a general framework for constructing a prediction set with finite-sample exact coverage, conditional on the unit being selected by a given procedure. The general form of our method accommodates arbitrary selection rules that are invariant to the permutation of the calibration units, and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We also work out computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction.
Paper Structure (11 sections, 4 theorems, 3 equations, 2 figures)

This paper contains 11 sections, 4 theorems, 3 equations, 2 figures.

Key Result

Proposition 1

Suppose a family of prediction sets $\{\widehat{C}_{\alpha,n+j}^{(\ell)}\}_{\ell\in {\mathcal{L}}}$ satisfy $\mathbb{P} (Y_{n+j} \in \widehat{C}_{\alpha,n+j}^{(\ell)} {\,|\,} j\in \widehat{{\mathcal{S}}},~\widehat{{\mathcal{S}}} \in \mathfrak{S}_\ell ) \geq 1-\alpha$ for a set of disjoint taxonomies

Figures (2)

  • Figure 1: Visualization of the intuition behind the reference set. (a) Marginally, the calibration data (black) are exchangeable with respect to the test point (blue). (b) The calibration data are not exchangeable with respect to the test point (shaded blue) given a selection event. (c) We find calibration data which, when posited as a "test point", would lead to the same selection event. (d) The reference set consists of calibration data that are exchangeable with respect to the test point given selection, and we use them to construct JOMI prediction sets.
  • Figure 2: Graphical illustration of ${\mathcal{D}}_{\textnormal{calib}}^{\textnormal{swap}(i,j)}(y)$ and ${\mathcal{D}}_{\textnormal{test}}^{{\textnormal{swap}(i,j)}}$.

Theorems & Definitions (5)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Remark 1: Interpretation of selection-conditional coverage and FCR control
  • Theorem 1