Stochastic Online Conformal Prediction with Semi-Bandit Feedback

Haosen Ge; Hamsa Bastani; Osbert Bastani

Stochastic Online Conformal Prediction with Semi-Bandit Feedback

Haosen Ge, Hamsa Bastani, Osbert Bastani

TL;DR

This paper addresses online conformal prediction under semi-bandit feedback, where true labels are revealed only if they lie in the predicted set. It introduces the Semi-Bandit Prediction Set (SPS) algorithm, which uses a high-probability upper bound on the miscoverage CDF via a DKW-based inequality to adapt thresholds $\tau_t$ and ensure coverage $P(y_t^* \in C_t) \ge \alpha$ while keeping prediction sets compact. The authors prove a regret bound of $R_T = O(\sqrt{T})$ (up to logarithmic factors) and that $C_t^* \subseteq C_t$ holds with high probability, even with semi-bandit feedback. Empirically, SPS demonstrates strong performance across image classification, document retrieval, and second-price auction tasks, achieving the target coverage with smaller prediction sets and outperforming several baselines.

Abstract

Conformal prediction has emerged as an effective strategy for uncertainty quantification by modifying a model to output sets of labels instead of a single label. These prediction sets come with the guarantee that they contain the true label with high probability. However, conformal prediction typically requires a large calibration dataset of i.i.d. examples. We consider the online learning setting, where examples arrive over time, and the goal is to construct prediction sets dynamically. Departing from existing work, we assume semi-bandit feedback, where we only observe the true label if it is contained in the prediction set. For instance, consider calibrating a document retrieval model to a new domain; in this setting, a user would only be able to provide the true label if the target document is in the prediction set of retrieved documents. We propose a novel conformal prediction algorithm targeted at this setting, and prove that it obtains sublinear regret compared to the optimal conformal predictor. We evaluate our algorithm on a retrieval task, an image classification task, and an auction price-setting task, and demonstrate that it empirically achieves good performance compared to several baselines.

Stochastic Online Conformal Prediction with Semi-Bandit Feedback

TL;DR

and ensure coverage

while keeping prediction sets compact. The authors prove a regret bound of

(up to logarithmic factors) and that

holds with high probability, even with semi-bandit feedback. Empirically, SPS demonstrates strong performance across image classification, document retrieval, and second-price auction tasks, achieving the target coverage with smaller prediction sets and outperforming several baselines.

Abstract

Paper Structure (9 sections, 5 theorems, 36 equations, 6 figures, 1 algorithm)

This paper contains 9 sections, 5 theorems, 36 equations, 6 figures, 1 algorithm.

Introduction
Problem Formulation
Algorithm
Proof of Theorem \ref{['thm:main']}
Experiments
Experimental Setup
Results
Conclusion
Additional Experiments

Key Result

Theorem 3.1

Algorithm alg:main satisfies Also, with probability at least $1 - 2/T$, $\tau_t \leq \tau^*$ for all $t$.

Figures (6)

Figure 1: Cumulative Regret
Figure 2: Coverage Rate
Figure 3: Undercoverage Count
Figure 4: Cumulative Regret
Figure 5: Coverage Rate
...and 1 more figures

Theorems & Definitions (9)

Theorem 3.1
Lemma 4.1
proof
Lemma 4.2
proof
Lemma 4.3
proof
Theorem 4.3
proof

Stochastic Online Conformal Prediction with Semi-Bandit Feedback

TL;DR

Abstract

Stochastic Online Conformal Prediction with Semi-Bandit Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)