A Few Observations on Sample-Conditional Coverage in Conformal Prediction
John C. Duchi
TL;DR
This work assesses conditional validity within conformal prediction, demonstrating that split-conformal methods can provide high-probability sample-conditional guarantees and extending to approximate weighted-conditional coverage via quantile-regression on held-out data. It formalizes a weighted-conditional framework using a function class W and shows minimax-rate optimal guarantees for approximate conditional coverage, supported by empirical process-based bounds. The paper develops sharp, rate-optimal bounds leveraging Talagrand-type concentration and VC-dimension tools, and provides building-block proofs for one- and two-sided coverage deviations in various settings, including distinct-score scenarios. Through synthetic experiments and CIFAR-100 experiments, it confirms the practical viability of adaptive-threshold split-conformal predictions, while also highlighting computational advantages and areas needing further development to achieve exact conditional validity in high-dimensional, data-limited regimes.
Abstract
We revisit the problem of constructing predictive confidence sets for which we wish to obtain some type of conditional validity. We provide new arguments showing how ``split conformal'' methods achieve near desired coverage levels with high probability, a guarantee conditional on the validation data rather than marginal over it. In addition, we directly consider (approximate) conditional coverage, where, e.g., conditional on a covariate $X$ belonging to some group of interest, we would like a guarantee that a predictive set covers the true outcome $Y$. We show that the natural method of performing quantile regression on a held-out (validation) dataset yields minimax optimal guarantees of coverage here. Complementing these positive results, we also provide experimental evidence that interesting work remains to be done to develop computationally efficient but valid predictive inference methods.
