Characterizing the Training-Conditional Coverage of Full Conformal Inference in High Dimensions
Isaac Gibbs, Emmanuel J. Candès
TL;DR
This work analyzes training-conditional coverage for full conformal inference in high-dimensional regression under proportional asymptotics. It proves that, for high-dimensional ridge regression and under a broad class of convex losses, full conformal prediction sets achieve asymptotically exact training-conditional coverage, while uncorrected residual-based methods exhibit systematic miscoverage. It also shows that regularization levels chosen on the training data can yield valid coverage without sacrificing conformal correction, implying potential computational savings for hyperparameter tuning. The paper further discusses heuristic and simulation-based extensions to LASSO and quantile regression, illustrating the broader applicability of the training-conditional conformal framework in high dimensions. Overall, it clarifies when full conformal corrections are essential for reliable uncertainty quantification and demonstrates practical avenues for implementing them efficiently.
Abstract
We study the coverage properties of full conformal regression in the proportional asymptotic regime where the ratio of the dimension and the sample size converges to a constant. In this setting, existing theory tells us only that full conformal inference is unbiased, in the sense that its average coverage lies at the desired level when marginalized over both the new test point and the training data. Considerably less is known about the behaviour of these methods conditional on the training set. As a result, the exact benefits of full conformal inference over much simpler alternative methods is unclear. This paper investigates the behaviour of full conformal inference and natural uncorrected alternatives for a broad class of $L_2$-regularized linear regression models. We show that in the proportional asymptotic regime the training-conditional coverage of full conformal inference concentrates at the target value. On the other hand, simple alternatives that directly compare test and training residuals realize constant undercoverage bias. While these results demonstrate the necessity of full conformal in correcting for high-dimensional overfitting, we also show that this same methodology is redundant for the related task of tuning the regularization level. In particular, we show that full conformal inference still yields asymptotically valid coverage when the regularization level is selected using only the training set, without consideration of the test point. Simulations show that our asymptotic approximations are accurate in finite samples and can be readily extended to other popular full conformal variants, such as full conformal quantile regression and the LASSO, that do not directly meet our assumptions.
