Confidence on the Focal: Conformal Prediction with Selection-Conditional Coverage
Ying Jin, Zhimei Ren
TL;DR
This work addresses uncertainty quantification for focal units in conformal prediction by introducing JOMI, a reference-set based framework that achieves selection-conditional coverage under data-driven selection. By swapping calibration and test units, JOMI constructs a reference set whose scores remain exchangeable conditional on the selection, enabling calibrated prediction sets for multiple test units. The method accommodates a broad class of selection rules, including top-K, p-value thresholds, and preliminary conformal-based selections, with computationally efficient implementations and solid theoretical guarantees. Empirical results in drug discovery and health risk prediction illustrate that marginal conformal intervals can under- or over-cover the selected units, while JOMI maintains the promised coverage and adapts interval lengths to the selection context.
Abstract
Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn test point with a prescribed probability. However, in practice, data-driven methods are often used to identify specific test unit(s) of interest, requiring uncertainty quantification tailored to these focal units. In such cases, marginally valid conformal prediction intervals may fail to provide valid coverage for the focal unit(s) due to selection bias. This paper presents a general framework for constructing a prediction set with finite-sample exact coverage, conditional on the unit being selected by a given procedure. The general form of our method accommodates arbitrary selection rules that are invariant to the permutation of the calibration units, and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We also work out computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction.
