Table of Contents
Fetching ...

The Benefit of Being Bayesian in Online Conformal Prediction

Zhiyu Zhang, Zhou Lu, Heng Yang

TL;DR

This work addresses online conformal prediction under adversarial data by introducing a Bayesian-regularized CP algorithm that outputs confidence thresholds for multiple alpha-queries online. The key idea is to maintain a belief $P_t$ that blends a prior with the empirical past, so $r_t(\alpha)=q_\alpha(P_t)$, yielding a non-linearized FTRL-like update with provable regret $O(R\sqrt{T})$ for all $\alpha$ and robustness to monotonicity issues. The framework adapts to iid data, recovering near-ERM guarantees with dataset-conditional coverage, and connects to Dirichlet-process posterior means in its Bayesian interpretation. Extensions include a memory-efficient quantized version and a discounted variant for continual distribution shift, both supported by experiments on synthetic switching sequences and stock-price data. Overall, the approach combines the strengths of data-centric CP with Bayesian regularization to deliver robust, multi-alpha online confidence sets applicable in real-world risk assessment scenarios.

Abstract

Based on the framework of Conformal Prediction (CP), we study the online construction of confidence sets given a black-box machine learning model. By converting the target confidence levels into quantile levels, the problem can be reduced to predicting the quantiles (in hindsight) of a sequentially revealed data sequence. Two very different approaches have been studied previously: (i) Assuming the data sequence is iid or exchangeable, one could maintain the empirical distribution of the observed data as an algorithmic belief, and directly predict its quantiles. (ii) Due to the fragility of statistical assumptions, a recent trend is to consider the non-distributional, adversarial setting and apply first-order online optimization algorithms to moving quantile losses. However, it requires the oracle knowledge of the target quantile level, and suffers from a previously overlooked monotonicity issue due to the associated loss linearization. This paper presents an adaptive CP algorithm that combines their strengths. Without any statistical assumption, it is able to answer multiple arbitrary confidence level queries with low regret, while also overcoming the monotonicity issue suffered by first-order optimization baselines. Furthermore, if the data sequence is actually iid, then the same algorithm is automatically equipped with the "correct" coverage probability guarantee. To achieve such strengths, our key technical innovation is to regularize the aforementioned algorithmic belief (the empirical distribution) by a Bayesian prior, which robustifies it by simulating a non-linearized Follow the Regularized Leader (FTRL) algorithm on the output. Such a belief update backbone is shared by prediction heads targeting different confidence levels, bringing practical benefits analogous to the recently proposed concept of U-calibration (Kleinberg et al., 2023).

The Benefit of Being Bayesian in Online Conformal Prediction

TL;DR

This work addresses online conformal prediction under adversarial data by introducing a Bayesian-regularized CP algorithm that outputs confidence thresholds for multiple alpha-queries online. The key idea is to maintain a belief that blends a prior with the empirical past, so , yielding a non-linearized FTRL-like update with provable regret for all and robustness to monotonicity issues. The framework adapts to iid data, recovering near-ERM guarantees with dataset-conditional coverage, and connects to Dirichlet-process posterior means in its Bayesian interpretation. Extensions include a memory-efficient quantized version and a discounted variant for continual distribution shift, both supported by experiments on synthetic switching sequences and stock-price data. Overall, the approach combines the strengths of data-centric CP with Bayesian regularization to deliver robust, multi-alpha online confidence sets applicable in real-world risk assessment scenarios.

Abstract

Based on the framework of Conformal Prediction (CP), we study the online construction of confidence sets given a black-box machine learning model. By converting the target confidence levels into quantile levels, the problem can be reduced to predicting the quantiles (in hindsight) of a sequentially revealed data sequence. Two very different approaches have been studied previously: (i) Assuming the data sequence is iid or exchangeable, one could maintain the empirical distribution of the observed data as an algorithmic belief, and directly predict its quantiles. (ii) Due to the fragility of statistical assumptions, a recent trend is to consider the non-distributional, adversarial setting and apply first-order online optimization algorithms to moving quantile losses. However, it requires the oracle knowledge of the target quantile level, and suffers from a previously overlooked monotonicity issue due to the associated loss linearization. This paper presents an adaptive CP algorithm that combines their strengths. Without any statistical assumption, it is able to answer multiple arbitrary confidence level queries with low regret, while also overcoming the monotonicity issue suffered by first-order optimization baselines. Furthermore, if the data sequence is actually iid, then the same algorithm is automatically equipped with the "correct" coverage probability guarantee. To achieve such strengths, our key technical innovation is to regularize the aforementioned algorithmic belief (the empirical distribution) by a Bayesian prior, which robustifies it by simulating a non-linearized Follow the Regularized Leader (FTRL) algorithm on the output. Such a belief update backbone is shared by prediction heads targeting different confidence levels, bringing practical benefits analogous to the recently proposed concept of U-calibration (Kleinberg et al., 2023).
Paper Structure (30 sections, 15 theorems, 66 equations, 7 figures, 1 algorithm)

This paper contains 30 sections, 15 theorems, 66 equations, 7 figures, 1 algorithm.

Key Result

Theorem 1

For all $\alpha\in[0,1]$, the output $r_t(\alpha)$ of Algorithm alg:main satisfies $r_1(\alpha)=\mathop{\mathrm{arg\,min}}\limits_{r\in\mathbb{R}}\psi(r)$, and $\forall t\geq 2$, Specifically,

Figures (7)

  • Figure 1: The CP interaction protocol.
  • Figure 2: Evaluating the monotonicity of threshold predictions. Ideally the orange line should be always above the blue line, since the associated target confidence level is higher. Columns correspond to different algorithms; rows correspond to different confidence level pairs.
  • Figure 3: Regret on switching data.
  • Figure 4: Predicted score threshold on AMD stock data.
  • Figure 5: Quantile loss on AMD stock data.
  • ...and 2 more figures

Theorems & Definitions (21)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 6
  • proof : Proof of Theorem \ref{['thm:main']}
  • Theorem 6
  • proof : Proof of Theorem \ref{['thm:regret']}
  • ...and 11 more