Table of Contents
Fetching ...

Online Selective Conformal Prediction: Errors and Solutions

Yusuf Sale, Aaditya Ramdas

TL;DR

The paper addresses uncertainty quantification under online selective reporting, where only selected observations receive prediction intervals. It reveals that many calibration strategies fail to preserve exchangeability with the selected test datum, undermining selection-conditional coverage and FCR control. To fix this, it introduces exchangeability-preserving calibration strategies—EXPRESS, K_EXPRESS, and EXPRESS_M—along with a merging approach, proving strong selection-conditional coverage and provable FCR control under these schemes. Empirical results illustrate the trade-offs between calibration-set size and interval informativeness, and compare these methods to conformal LORD-CI and adaptive conformal inference, highlighting practical guidance for robust online selective conformal prediction.

Abstract

In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for this by suitably selecting the calibration data. In this paper, we evaluate existing calibration selection strategies and pinpoint some fundamental errors in the associated claims that guarantee selection-conditional coverage and control of the false coverage rate (FCR). To address these shortcomings, we propose novel calibration selection strategies that provably preserve the exchangeability of the calibration data and the selected test datum. Consequently, we demonstrate that online selective conformal inference with these strategies guarantees both selection-conditional coverage and FCR control. Our theoretical findings are supported by experimental evidence examining tradeoffs between valid methods.

Online Selective Conformal Prediction: Errors and Solutions

TL;DR

The paper addresses uncertainty quantification under online selective reporting, where only selected observations receive prediction intervals. It reveals that many calibration strategies fail to preserve exchangeability with the selected test datum, undermining selection-conditional coverage and FCR control. To fix this, it introduces exchangeability-preserving calibration strategies—EXPRESS, K_EXPRESS, and EXPRESS_M—along with a merging approach, proving strong selection-conditional coverage and provable FCR control under these schemes. Empirical results illustrate the trade-offs between calibration-set size and interval informativeness, and compare these methods to conformal LORD-CI and adaptive conformal inference, highlighting practical guidance for robust online selective conformal prediction.

Abstract

In online selective conformal inference, data arrives sequentially, and prediction intervals are constructed only when an online selection rule is met. Since online selections may break the exchangeability between the selected test datum and the rest of the data, one must correct for this by suitably selecting the calibration data. In this paper, we evaluate existing calibration selection strategies and pinpoint some fundamental errors in the associated claims that guarantee selection-conditional coverage and control of the false coverage rate (FCR). To address these shortcomings, we propose novel calibration selection strategies that provably preserve the exchangeability of the calibration data and the selected test datum. Consequently, we demonstrate that online selective conformal inference with these strategies guarantees both selection-conditional coverage and FCR control. Our theoretical findings are supported by experimental evidence examining tradeoffs between valid methods.

Paper Structure

This paper contains 20 sections, 11 theorems, 52 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Lemma 4.2

Let $\{\mathcal{S}_t\}_{t \geq 0}$ be a sequence of decision driven selection rules and denote by $\{\mathcal{S}^{\widetilde{\pi}}_t\}_{t \geq 0}$ the sequence of selection rules generated when operating on $Z_{\widetilde{\pi}(-n)},...,Z_{\widetilde{\pi}(t)}$. The selection rule sequences $\{\mathca

Figures (8)

  • Figure 1: An illustrative example of the online selective (conformal prediction) setting. (a) Usual online conformal prediction setting, where one constructs for each time $t \geq 0$ a prediction interval. (b) In the online selective conformal predictive setting, we only report prediction intervals for selected ($\bullet$) times, while no prediction intervals are constructed for those that are not selected ($\bf{\times}$).
  • Figure 2: Miscoverage () is shown alongside the number of calibration points (), median interval length () and the fraction of infinite length prediction intervals (). We highlight provably correct methods (✔) and the target level (). All metrics are averaged over $N = 1.0e6$ runs.
  • Figure 3: (a) FCR as a function of time $T$. The dashed black line represents the target level $\alpha = 0.4$. We highlight provably correct methods (✔). (b) Number of calibration data points used over time. Strategies accumulating more calibration data tend to yield shorter prediction intervals. (c) Fraction of prediction intervals of infinite length over time. A high fraction suggests a strategy often fails to provide informative intervals. Only reported for novel strategies. (d) Median prediction interval length over time. Shorter intervals indicate higher informativeness. All metrics are averaged over $N = 1.0e4$ runs.
  • Figure 4: (a) FCR as a function of time $T$. The dashed black line represents the target level $\alpha = 0.4$. We highlight provably correct methods (✔). (b) Median prediction interval length over time. Shorter intervals indicate higher informativeness. All metrics are averaged over $N = 1.0e4$ runs.
  • Figure 5: Miscoverage () is shown alongside the number of calibration points (), median interval length () and the fraction of infinite length prediction intervals (). We highlight provably correct methods (✔) and the target level (). All metrics are averaged over $N = 1.0e6$ runs.
  • ...and 3 more figures

Theorems & Definitions (21)

  • Definition 2.1
  • Remark 3.1
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • proof
  • Theorem 4.5
  • Proposition 5.2
  • ...and 11 more