Selecting informative conformal prediction sets with false coverage rate control
Ulysse Gazin, Ruth Heller, Ariane Marandon, Etienne Roquain
TL;DR
The paper tackles constructing conformal prediction sets after a data-driven informativeness selection, with finite-sample control of the false coverage rate on the selected subset. It introduces two methods, InfoSP and InfoSCOP, that fuse informative selection with conformal prediction, ensuring $\mathrm{FCR} \leq \alpha$ while reporting sets from a pre-specified informative family $\mathcal{I}$; the theoretical backbone rests on a general FCR control theorem under concordant selection and on adjusted $p$-values. The authors provide concrete instantiations for regression (excluding intervals or limiting interval length) and classification (excluding a null class or non-trivial sets), with comprehensive simulations and real-data experiments (yeast gene expression and CIFAR-10) illustrating improved power after informative selection while preserving error guarantees. The work unifies a broad class of informative constraints with conformal calibration, offering practical tools for reporting only meaningful prediction sets in high-throughput or heterogeneous-data contexts and suggesting avenues for future extensions like adaptive scoring and directional error control.
Abstract
In supervised learning, including regression and classification, conformal methods provide prediction sets for the outcome/label with finite sample coverage for any machine learning predictor. We consider here the case where such prediction sets come after a selection process. The selection process requires that the selected prediction sets be `informative' in a well defined sense. We consider both the classification and regression settings where the analyst may consider as informative only the sample with prediction sets small enough, excluding null values, or obeying other appropriate `monotone' constraints. We develop a unified framework for building such informative conformal prediction sets while controlling the false coverage rate (FCR) on the selected sample. While conformal prediction sets after selection have been the focus of much recent literature in the field, the new introduced procedures, called InfoSP and InfoSCOP, are to our knowledge the first ones providing FCR control for informative prediction sets. We show the usefulness of our resulting procedures on real and simulated data.
