Co-optimization for Adaptive Conformal Prediction

Xiaoyi Su; Zhixin Zhou; Rui Luo

Co-optimization for Adaptive Conformal Prediction

Xiaoyi Su, Zhixin Zhou, Rui Luo

TL;DR

This work proposes Co-optimization for Adaptive Conformal Prediction (CoCP), a framework that learns prediction intervals by jointly optimizing a center and a radius and shows that CoCP asymptotically approaches the length-minimizing conditional interval at the target coverage level as the estimation error and smoothing vanish.

Abstract

Conformal prediction (CP) provides finite-sample, distribution-free marginal coverage, but standard conformal regression intervals can be inefficient under heteroscedasticity and skewness. In particular, popular constructions such as conformalized quantile regression (CQR) often inherit a fixed notion of center and enforce equal-tailed errors, which can displace the interval away from high-density regions and produce unnecessarily wide sets. We propose Co-optimization for Adaptive Conformal Prediction (CoCP), a framework that learns prediction intervals by jointly optimizing a center $m(x)$ and a radius $h(x)$.CoCP alternates between (i) learning $h(x)$ via quantile regression on the folded absolute residual around the current center, and (ii) refining $m(x)$ with a differentiable soft-coverage objective whose gradients concentrate near the current boundaries, effectively correcting mis-centering without estimating the full conditional density. Finite-sample marginal validity is guaranteed by split-conformal calibration with a normalized nonconformity score. Theory characterizes the population fixed point of the soft objective and shows that, under standard regularity conditions, CoCP asymptotically approaches the length-minimizing conditional interval at the target coverage level as the estimation error and smoothing vanish. Experiments on synthetic and real benchmarks demonstrate that CoCP yields consistently shorter intervals and achieves state-of-the-art conditional-coverage diagnostics.

Co-optimization for Adaptive Conformal Prediction

TL;DR

Abstract

and a radius

.CoCP alternates between (i) learning

via quantile regression on the folded absolute residual around the current center, and (ii) refining

with a differentiable soft-coverage objective whose gradients concentrate near the current boundaries, effectively correcting mis-centering without estimating the full conditional density. Finite-sample marginal validity is guaranteed by split-conformal calibration with a normalized nonconformity score. Theory characterizes the population fixed point of the soft objective and shows that, under standard regularity conditions, CoCP asymptotically approaches the length-minimizing conditional interval at the target coverage level as the estimation error and smoothing vanish. Experiments on synthetic and real benchmarks demonstrate that CoCP yields consistently shorter intervals and achieves state-of-the-art conditional-coverage diagnostics.

Paper Structure (73 sections, 13 theorems, 108 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 73 sections, 13 theorems, 108 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Related Work
The Geometry of Efficient Intervals
Setup and the oracle target
Residual folding
Why translating the center can shorten the interval
Co-optimizing Interval Geometry via Alternating Learning
A split version of CoCP
Radius update: folded quantile regression
Center update: boundary-local soft coverage
Conformal calibration
Cross-fitting and ensembling
Theoretical Properties
Finite-sample marginal validity
Temperature-dependent population target
...and 58 more sections

Key Result

Theorem 1

If the calibration set and a new test sample $(X,Y)$ are exchangeable conditional on the training fit (i.e., the standard split-conformal setting), then

Figures (7)

Figure 1: From any interval to HDI via folding and boundary balancing. (a) Under skewness, equal-tailed intervals can be displaced relative to the $(1-\alpha)$-HDI and are typically longer. (b) Folding around a candidate center $m$ maps $y$ to $\lvert y-m\rvert$, producing a two-layer view. (c) The folded $(1-\alpha)$-boundary corresponds to two endpoints; translating $m$ shortens the interval when the endpoint densities are imbalanced (push--pull). (d) At equilibrium the endpoint densities match, recovering the HDI.
Figure 2: Gradient dynamics of the soft-coverage objective. (a) The soft window serves as a differentiable surrogate for the interval indicator. The underlying data density (dashed line) is imbalanced at the two boundaries. (b) The gradient of the soft window with respect to the center $m$ acts as a signed sampling kernel, being negative at the left boundary and positive at the right. The resulting local gradient (the product of the conditional density and the derivative kernel) is asymmetric under skewness; the higher-density boundary generates a dominant directional signal that shifts the center $m$ toward the concentration of probability mass.
Figure 3: Overview of the CoCP framework. The pipeline begins by parameterizing the prediction interval into a center $m(x)$ and a radius $h(x)$, which are then iteratively refined through an alternating co-optimization process during K-fold cross-fitting. The final adaptive interval is constructed by aggregating the learned components and applying a split-conformal calibration to ensure marginal validity.
Figure 4: Prediction intervals on the synthetic Normal dataset. This visualization compares nine conformal methods under symmetric, heteroscedastic Gaussian noise. The gray area denotes the Oracle $(1-\alpha)$-HDI. Most adaptive methods, including CoCP, successfully capture the symmetric uncertainty, while the Split baseline (top-left) produces overly conservative, non-adaptive intervals.
Figure 5: Prediction intervals on the synthetic LogNormal dataset. Under skewed conditional noise, the discrepancy between center-invariant methods (e.g., CQR) and the Oracle HDI (gray) becomes prominent. CoCP (bottom-right) demonstrates its ability to shift the interval center and scale the radius simultaneously, resulting in a geometry that closely aligns with the Oracle's bounds.
...and 2 more figures

Theorems & Definitions (30)

Remark 1
Theorem 1: Finite-sample marginal coverage
Lemma 1: Soft-gradient endpoint imbalance
Definition 1: $\beta$-soft oracle
Lemma 2: Vanishing $\beta$-bias of the $\beta$-soft oracle
Theorem 2: Asymptotic optimal length and conditional coverage
Remark 2
Proposition 1
proof
proof
...and 20 more

Co-optimization for Adaptive Conformal Prediction

TL;DR

Abstract

Co-optimization for Adaptive Conformal Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (30)