Improving the statistical efficiency of cross-conformal prediction
Matteo Gasparin, Aaditya Ramdas
TL;DR
This work addresses the statistical efficiency of cross-conformal prediction by introducing new variants that shrink prediction sets without sacrificing finite-sample marginal coverage. Building on exchangeable and randomized p-value combination results, the authors develop several methods, including e-mod-cross, u-mod-cross, and eu-mod-cross, that maintain a robust $1-2\alpha$ (or $1-2\alpha'$ under refinements) coverage while reducing set width. They provide theoretical guarantees and demonstrate through simulations and real-data applications that these variants can substantially decrease set size, with trade-offs in variability due to randomness and dependence. The results offer practical guidance for deploying conformal prediction in settings where computational efficiency and tighter uncertainty quantification are crucial, while preserving distribution-free validity.
Abstract
Vovk (2015) introduced cross-conformal prediction, a modification of split conformal designed to improve the width of prediction sets. The method, when trained with a miscoverage rate equal to $α$ and $n \gg K$, ensures a marginal coverage of at least $1 - 2α- 2(1-α)(K-1)/(n+K)$, where $n$ is the number of observations and $K$ denotes the number of folds. A simple modification of the method achieves coverage of at least $1-2α$. In this work, we propose new variants of both methods that yield smaller prediction sets without compromising the latter theoretical guarantees. The proposed methods are based on recent results deriving more statistically efficient combination of p-values that leverage exchangeability and randomization. Simulations confirm the theoretical findings and bring out some important tradeoffs.
