Robust Conformal Prediction Using Privileged Information

Shai Feldman; Yaniv Romano

Robust Conformal Prediction Using Privileged Information

Shai Feldman, Yaniv Romano

TL;DR

This work addresses uncertainty quantification under corrupted training data by introducing Privileged Conformal Prediction (PCP), which leverages privileged information available only during training to correct distribution shift caused by corruption. PCP builds on weighted conformal prediction but avoids requiring test-time PI by computing a PI-informed, conservative threshold that guarantees marginal coverage: $\mathbb{P}(Y^{test} \in C^{PCP}(X^{test})) \ge 1-\alpha$. The method includes a scarce-data variant (LOO-PCP) and demonstrates strong empirical performance across causal inference (IHDP), missing response, and noisy label scenarios, achieving valid coverage with informative prediction sets. Overall, PCP provides a theoretically grounded, practical calibration scheme for robust uncertainty quantification in the presence of training-time corruptions, with broad applicability and public software support.

Abstract

We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data, such as missing or noisy variables. Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption. Importantly, naively applying conformal prediction does not provide reliable predictions in this setting, due to the distribution shift induced by the corruptions. To account for the distribution shift, we assume access to privileged information (PI). The PI is formulated as additional features that explain the distribution shift, however, they are only available during training and absent at test time. We approach this problem by introducing a novel generalization of weighted conformal prediction and support our method with theoretical coverage guarantees. Empirical experiments on both real and synthetic datasets indicate that our approach achieves a valid coverage rate and constructs more informative predictions compared to existing methods, which are not supported by theoretical guarantees.

Robust Conformal Prediction Using Privileged Information

TL;DR

. The method includes a scarce-data variant (LOO-PCP) and demonstrates strong empirical performance across causal inference (IHDP), missing response, and noisy label scenarios, achieving valid coverage with informative prediction sets. Overall, PCP provides a theoretically grounded, practical calibration scheme for robust uncertainty quantification in the presence of training-time corruptions, with broad applicability and public software support.

Abstract

Paper Structure (49 sections, 6 theorems, 74 equations, 13 figures, 2 tables, 4 algorithms)

This paper contains 49 sections, 6 theorems, 74 equations, 13 figures, 2 tables, 4 algorithms.

Introduction
Motivation
Problem setup
Our contribution
Background and related work
Conformal prediction
Weighted conformal prediction
Additional related work
Proposed method
A naive approach: Two-Staged Conformal
Our main proposal: Privileged Conformal Prediction
Privileged Conformal Prediction for scarce data
Applications
Causal inference: semi-synthetic example
Missing response variable: semi-synthetic example
...and 34 more sections

Key Result

Proposition 1

Suppose that $\{({X}_i(0),{X}_i(1), {Y}_i(0),{Y}_i(1),Z_i, M_i)\}_{i=1}^{n+1}$ are exchangeable, the observed covariates are clean, i.e., $\forall i: X^\textup{obs}_i=X_i(0)={X}_i(1)$, the covariate shift assumption holds, i.e., $({X}(0), {Y}(0)) \perp \!\!\! \perp M \mid Z$, and $P_{Z}$ is absolut

Figures (13)

Figure 1: Causal inference experiment: IHDP dataset. The coverage rate and average interval length achieved by naive jackknife+ (Naive CP), naive JAW which considers only $X$ to cope with the distribution shift (Naive WCP), an infeasible JAW which uses $Z^\text{test}$ (Infeasible WCP), and the proposed method (Privileged CP). The metrics are evaluated over 50 random data splits.
Figure 2: Missing response experiment. The coverage rate and average interval length obtained by various methods; see text for details. Performance metrics are evaluated over 20 random data splits.
Figure 3: Noisy response experiment: CIFAR-10N dataset. Average coverage and set size obtained by various methods; see text for details. The metrics are evaluated over 20 random data splits.
Figure 4: IHDP dataset experiment. The coverage rate and average interval length achieved by an uncalibrated quantile regression (Uncalibrated), a naive jackknife+ (Naive CP), JAW (Weighted CP) which estimates the corruption probability from either $X$ (orange), $Z$ (green), or uses the oracle probabilities (red), and the proposed method (Privileged CP) with the three options for the corruption probabilities. All methods are applied to attain a coverage rate at level $1 - 2\alpha = 90\%$. The metrics are evaluated over 50 random data splits.
Figure 5: Twins dataset experiment. The coverage rate and average set size achieved by naive conformal prediction (Naive CP), Weighted CP which estimates the corruption probability from either $X$ (orange), $Z$ (green), or uses the oracle probabilities (red), the baseline Two Staged CP, and the proposed method (Privileged CP) with the three options for the corruption probabilities. All methods are applied to attain a coverage rate at level $1 - \alpha = 90\%$. The metrics are evaluated over 20 random data splits.
...and 8 more figures

Theorems & Definitions (15)

Example 1: Noisy response
Example 2: Missing features
Example 3: Missing response
Proposition 1
Theorem 1
Theorem 2
proof
proof
proof
Lemma 1
...and 5 more

Robust Conformal Prediction Using Privileged Information

TL;DR

Abstract

Robust Conformal Prediction Using Privileged Information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (15)