Table of Contents
Fetching ...

Characterizing the Training-Conditional Coverage of Full Conformal Inference in High Dimensions

Isaac Gibbs, Emmanuel J. Candès

TL;DR

This work analyzes training-conditional coverage for full conformal inference in high-dimensional regression under proportional asymptotics. It proves that, for high-dimensional ridge regression and under a broad class of convex losses, full conformal prediction sets achieve asymptotically exact training-conditional coverage, while uncorrected residual-based methods exhibit systematic miscoverage. It also shows that regularization levels chosen on the training data can yield valid coverage without sacrificing conformal correction, implying potential computational savings for hyperparameter tuning. The paper further discusses heuristic and simulation-based extensions to LASSO and quantile regression, illustrating the broader applicability of the training-conditional conformal framework in high dimensions. Overall, it clarifies when full conformal corrections are essential for reliable uncertainty quantification and demonstrates practical avenues for implementing them efficiently.

Abstract

We study the coverage properties of full conformal regression in the proportional asymptotic regime where the ratio of the dimension and the sample size converges to a constant. In this setting, existing theory tells us only that full conformal inference is unbiased, in the sense that its average coverage lies at the desired level when marginalized over both the new test point and the training data. Considerably less is known about the behaviour of these methods conditional on the training set. As a result, the exact benefits of full conformal inference over much simpler alternative methods is unclear. This paper investigates the behaviour of full conformal inference and natural uncorrected alternatives for a broad class of $L_2$-regularized linear regression models. We show that in the proportional asymptotic regime the training-conditional coverage of full conformal inference concentrates at the target value. On the other hand, simple alternatives that directly compare test and training residuals realize constant undercoverage bias. While these results demonstrate the necessity of full conformal in correcting for high-dimensional overfitting, we also show that this same methodology is redundant for the related task of tuning the regularization level. In particular, we show that full conformal inference still yields asymptotically valid coverage when the regularization level is selected using only the training set, without consideration of the test point. Simulations show that our asymptotic approximations are accurate in finite samples and can be readily extended to other popular full conformal variants, such as full conformal quantile regression and the LASSO, that do not directly meet our assumptions.

Characterizing the Training-Conditional Coverage of Full Conformal Inference in High Dimensions

TL;DR

This work analyzes training-conditional coverage for full conformal inference in high-dimensional regression under proportional asymptotics. It proves that, for high-dimensional ridge regression and under a broad class of convex losses, full conformal prediction sets achieve asymptotically exact training-conditional coverage, while uncorrected residual-based methods exhibit systematic miscoverage. It also shows that regularization levels chosen on the training data can yield valid coverage without sacrificing conformal correction, implying potential computational savings for hyperparameter tuning. The paper further discusses heuristic and simulation-based extensions to LASSO and quantile regression, illustrating the broader applicability of the training-conditional conformal framework in high dimensions. Overall, it clarifies when full conformal corrections are essential for reliable uncertainty quantification and demonstrates practical avenues for implementing them efficiently.

Abstract

We study the coverage properties of full conformal regression in the proportional asymptotic regime where the ratio of the dimension and the sample size converges to a constant. In this setting, existing theory tells us only that full conformal inference is unbiased, in the sense that its average coverage lies at the desired level when marginalized over both the new test point and the training data. Considerably less is known about the behaviour of these methods conditional on the training set. As a result, the exact benefits of full conformal inference over much simpler alternative methods is unclear. This paper investigates the behaviour of full conformal inference and natural uncorrected alternatives for a broad class of -regularized linear regression models. We show that in the proportional asymptotic regime the training-conditional coverage of full conformal inference concentrates at the target value. On the other hand, simple alternatives that directly compare test and training residuals realize constant undercoverage bias. While these results demonstrate the necessity of full conformal in correcting for high-dimensional overfitting, we also show that this same methodology is redundant for the related task of tuning the regularization level. In particular, we show that full conformal inference still yields asymptotically valid coverage when the regularization level is selected using only the training set, without consideration of the test point. Simulations show that our asymptotic approximations are accurate in finite samples and can be readily extended to other popular full conformal variants, such as full conformal quantile regression and the LASSO, that do not directly meet our assumptions.

Paper Structure

This paper contains 30 sections, 41 theorems, 247 equations, 4 figures.

Key Result

Theorem 1

[Theorem 3.7 of Liang2023] Assume that $\{(X_i,Y_i)\}_{i=1}^n$ are i.i.d. and $\hat{\mu}(\cdot)$ is in-sample stable. Let $\{\rho_n\}_{n=1}^{\infty}$ be defined as above. Then,

Figures (4)

  • Figure 1: Empirical distribution of the training-conditional miscoverage of full conformal ridge regression (orange) and the uncorrected method (blue) that does not include the new test point in the fit. Boxplots in the figure show results from 100 trials, where in each trial the training-conditional miscoverage is evaluated empirically by averaging over 2000 test points. Dotted lines display the asymptotic miscoverages predicted by our theory. Note that since full conformal obtains the desired miscoverage asymptotically, the orange line also displays the target level of $\alpha = 0.1$.
  • Figure 2: Empirical distribution of the training-conditional miscoverage of full conformal ridge regression with constant regularization level $\tau = 0.1$ (solid) and $\tau$ chosen using leave-one-out cross validation (striped) from the set $\{0,0.01,0.02,\dots,2\}$. Boxplots in the figure show results from 100 trials where in each trial the training-conditional coverage is evaluated empirically by averaging over 2000 test points. The orange dotted line shows the target level of $\alpha = 0.1$. For all samples sizes the dimension is set so that $d/n=0.25$ and data is generated as in Figure \ref{['fig:ridge_cov']}.
  • Figure 3: Empirical evaluation of our conjectures for the high-dimensional LASSO. Dots and error bars in the left panel display point estimates and 95% confidence intervals for $\mathbb{P}(Y_{n+1} \notin \hat{C}_{\textup{uncorr.}})$ obtained by averaging over 10000 trials at $n=800$, while dotted lines show the values predicted by our theory (namely the value appearing on the right-hand side of (\ref{['eq:lasso_asymp_cov']})). Boxplots in the right panel show the empirical distribution of the training-conditional miscoverage of the full conformal LASSO (orange) and its uncorrected variant (blue) taken over $100$ trials where in each trial the training-conditional miscoverage is evaluated empirically by averaging over 2000 test points. Results in this panel are for $\tau = 1$ and $d/n = 0.25$. Once again dotted lines display the asymptotic miscoverages predicted by our theory. Note that since full conformal obtains the desired miscoverage asymptotically, the orange line also displays the target level of $\alpha = 0.1$ used throughout. Data for both panels were generated as in Figure \ref{['fig:ridge_cov']}.
  • Figure 4: Empirical validation of our conjectures for high-dimensional quantile regression. Dots and error bars display point estimates and 95% confidence intervals for the values of $\mathbb{E}[\|\hat{\beta} - \beta^*\|_2]$ (top left panel), $\mathbb{E}[\hat{\beta}_0]$ (top right panel), $\mathbb{P}(Y_{n+1} \leq \hat{\beta}_{0,(n+1)} + X_{n+1}^\top \hat{\beta}_{(n+1)})$ (bottom right panel, blue) and $\mathbb{P}(Y_{n+1} \in \hat{C}_{\text{dual}}^{\text{QR}})$ (bottom right panel, orange), while dotted lines show the values predicted by our theory. Finally, violin plots in the bottom left panel show the empirical distribution of the error in our leave-one-out formula for $\eta_{n+1}$ (namely the absolute value of the right-hand side of (\ref{['eq:eta_loo_rep']})). All figures contain results from 1000 trials at $n=400$ and $\alpha = 0.1$ with data generated as in Figure \ref{['fig:ridge_cov']}.

Theorems & Definitions (77)

  • Theorem 1
  • Lemma 1
  • Corollary 1
  • proof : Proof sketch of Lemma \ref{['lem:ridge_stab']}
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Theorem 2
  • Lemma 6
  • ...and 67 more