Table of Contents
Fetching ...

Optimal training-conditional regret for online conformal prediction

Jiadong Liang, Zhimei Ren, Yuxin Chen

TL;DR

A split-conformal style algorithm is proposed that leverages drift detection to adaptively update calibration sets, which provably achieves minimax-optimal regret and establishes non-asymptotic regret guarantees for the online full conformal algorithm, which match the minimax lower bound under appropriate restrictions on the prediction sets.

Abstract

We study online conformal prediction for non-stationary data streams subject to unknown distribution drift. While most prior work studied this problem under adversarial settings and/or assessed performance in terms of gaps of time-averaged marginal coverage, we instead evaluate performance through training-conditional cumulative regret. We specifically focus on independently generated data with two types of distribution shift: abrupt change points and smooth drift. When non-conformity score functions are pretrained on an independent dataset, we propose a split-conformal style algorithm that leverages drift detection to adaptively update calibration sets, which provably achieves minimax-optimal regret. When non-conformity scores are instead trained online, we develop a full-conformal style algorithm that again incorporates drift detection to handle non-stationarity; this approach relies on stability - rather than permutation symmetry - of the model-fitting algorithm, which is often better suited to online learning under evolving environments. We establish non-asymptotic regret guarantees for our online full conformal algorithm, which match the minimax lower bound under appropriate restrictions on the prediction sets. Numerical experiments corroborate our theoretical findings.

Optimal training-conditional regret for online conformal prediction

TL;DR

A split-conformal style algorithm is proposed that leverages drift detection to adaptively update calibration sets, which provably achieves minimax-optimal regret and establishes non-asymptotic regret guarantees for the online full conformal algorithm, which match the minimax lower bound under appropriate restrictions on the prediction sets.

Abstract

We study online conformal prediction for non-stationary data streams subject to unknown distribution drift. While most prior work studied this problem under adversarial settings and/or assessed performance in terms of gaps of time-averaged marginal coverage, we instead evaluate performance through training-conditional cumulative regret. We specifically focus on independently generated data with two types of distribution shift: abrupt change points and smooth drift. When non-conformity score functions are pretrained on an independent dataset, we propose a split-conformal style algorithm that leverages drift detection to adaptively update calibration sets, which provably achieves minimax-optimal regret. When non-conformity scores are instead trained online, we develop a full-conformal style algorithm that again incorporates drift detection to handle non-stationarity; this approach relies on stability - rather than permutation symmetry - of the model-fitting algorithm, which is often better suited to online learning under evolving environments. We establish non-asymptotic regret guarantees for our online full conformal algorithm, which match the minimax lower bound under appropriate restrictions on the prediction sets. Numerical experiments corroborate our theoretical findings.
Paper Structure (140 sections, 27 theorems, 398 equations, 4 figures, 4 algorithms)

This paper contains 140 sections, 27 theorems, 398 equations, 4 figures, 4 algorithms.

Key Result

Theorem 3.1

Suppose that Assumption assump:split holds. If we set the detection thresholds as $\sigma_{n,r} \coloneqq 24\sqrt{\log(4\tau_{n,r})}$ for every stage-round index pair $(n,r)$, then Algorithm alg:OCID achieves

Figures (4)

  • Figure 1: The case with a single change point at $t_0$. (Left) pointwise coverage $\mathbb{P}(s_t\le q)$ vs. $t$; (right) block coverage error $\mathsf{cvg}\text{-}\mathsf{err}^{\star}_q(1,t)$ vs. $t$ along with a detection threshold $\sigma$.
  • Figure 2: Schematic illustration of a case with smooth, oscillating distribution shifts. (Left) pointwise coverage $\mathbb{P}(s_t\le q)$ vs. $t$; (right) block coverage error $\mathsf{cvg}\text{-}\mathsf{err}^{\star}_q(1,t)$ vs. $t$.
  • Figure 3: Cumulative regret and calibration quantiles under four data-generating settings. Top row: cumulative regret trajectories. Bottom row: calibration-quantile evolution; the black dashed curve indicates an approximation to the ground-truth quantile obtained via repeated simulations at each time point. Across settings, ACI exhibits a clear stepsize trade-off: ACI with large constant stepsizes adapts quickly but produces volatile quantile updates and suboptimal performance under stationarity, whereas ACI with smaller or decaying stepsizes yields more stable updates at the cost of slower adaptation to distributional changes. In comparison, DriftOCP is stable within stationary time segments and adapts rapidly to distribution shifts, yielding consistently controlled regret. Curves are averaged over 20 runs; shaded bands indicate $\pm 1$ standard deviation.
  • Figure 4: Online conformal prediction with different score constructions. Top row: prediction-interval width over time. Bottom row: local coverage rate computed with a rolling window of 100 steps; the horizontal line marks the target level $1-\alpha=0.9$. Vertical dashed lines indicate change points at $t=3333$ and $t=6667$. Columns correspond to the four settings (well-specified vs. misspecified model, each under mean vs. variance drift). The adaptive-score method (online SGD) yields noticeably shorter intervals and more stable coverage under variance drift, whereas the pretrained-score method is sensitive to a mismatch between the pretraining and test covariate distributions. The model-free baseline ($s_t=\left|{Y_t}\right|$) is in general conservative and produces wide prediction intervals; it is also sensitive to distribution shift, exhibiting undercoverage at change points. Curves are averaged over 20 runs, with shaded bands indicating $\pm1$ standard deviation.

Theorems & Definitions (36)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 4.1
  • Proposition 4.1
  • Remark 4.1
  • Theorem 4.2
  • Proposition 4.2
  • Proposition 4.3
  • Definition B.1
  • Lemma B.1
  • ...and 26 more