Table of Contents
Fetching ...

Robust Conformal Prediction Using Privileged Information

Shai Feldman, Yaniv Romano

TL;DR

This work addresses uncertainty quantification under corrupted training data by introducing Privileged Conformal Prediction (PCP), which leverages privileged information available only during training to correct distribution shift caused by corruption. PCP builds on weighted conformal prediction but avoids requiring test-time PI by computing a PI-informed, conservative threshold that guarantees marginal coverage: $\mathbb{P}(Y^{test} \in C^{PCP}(X^{test})) \ge 1-\alpha$. The method includes a scarce-data variant (LOO-PCP) and demonstrates strong empirical performance across causal inference (IHDP), missing response, and noisy label scenarios, achieving valid coverage with informative prediction sets. Overall, PCP provides a theoretically grounded, practical calibration scheme for robust uncertainty quantification in the presence of training-time corruptions, with broad applicability and public software support.

Abstract

We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data, such as missing or noisy variables. Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption. Importantly, naively applying conformal prediction does not provide reliable predictions in this setting, due to the distribution shift induced by the corruptions. To account for the distribution shift, we assume access to privileged information (PI). The PI is formulated as additional features that explain the distribution shift, however, they are only available during training and absent at test time. We approach this problem by introducing a novel generalization of weighted conformal prediction and support our method with theoretical coverage guarantees. Empirical experiments on both real and synthetic datasets indicate that our approach achieves a valid coverage rate and constructs more informative predictions compared to existing methods, which are not supported by theoretical guarantees.

Robust Conformal Prediction Using Privileged Information

TL;DR

This work addresses uncertainty quantification under corrupted training data by introducing Privileged Conformal Prediction (PCP), which leverages privileged information available only during training to correct distribution shift caused by corruption. PCP builds on weighted conformal prediction but avoids requiring test-time PI by computing a PI-informed, conservative threshold that guarantees marginal coverage: . The method includes a scarce-data variant (LOO-PCP) and demonstrates strong empirical performance across causal inference (IHDP), missing response, and noisy label scenarios, achieving valid coverage with informative prediction sets. Overall, PCP provides a theoretically grounded, practical calibration scheme for robust uncertainty quantification in the presence of training-time corruptions, with broad applicability and public software support.

Abstract

We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data, such as missing or noisy variables. Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption. Importantly, naively applying conformal prediction does not provide reliable predictions in this setting, due to the distribution shift induced by the corruptions. To account for the distribution shift, we assume access to privileged information (PI). The PI is formulated as additional features that explain the distribution shift, however, they are only available during training and absent at test time. We approach this problem by introducing a novel generalization of weighted conformal prediction and support our method with theoretical coverage guarantees. Empirical experiments on both real and synthetic datasets indicate that our approach achieves a valid coverage rate and constructs more informative predictions compared to existing methods, which are not supported by theoretical guarantees.
Paper Structure (49 sections, 6 theorems, 74 equations, 13 figures, 2 tables, 4 algorithms)

This paper contains 49 sections, 6 theorems, 74 equations, 13 figures, 2 tables, 4 algorithms.

Key Result

Proposition 1

Suppose that $\{({X}_i(0),{X}_i(1), {Y}_i(0),{Y}_i(1),Z_i, M_i)\}_{i=1}^{n+1}$ are exchangeable, the observed covariates are clean, i.e., $\forall i: X^\textup{obs}_i=X_i(0)={X}_i(1)$, the covariate shift assumption holds, i.e., $({X}(0), {Y}(0)) \perp \!\!\! \perp M \mid Z$, and $P_{Z}$ is absolut

Figures (13)

  • Figure 1: Causal inference experiment: IHDP dataset. The coverage rate and average interval length achieved by naive jackknife+ (Naive CP), naive JAW which considers only $X$ to cope with the distribution shift (Naive WCP), an infeasible JAW which uses $Z^\text{test}$ (Infeasible WCP), and the proposed method (Privileged CP). The metrics are evaluated over 50 random data splits.
  • Figure 2: Missing response experiment. The coverage rate and average interval length obtained by various methods; see text for details. Performance metrics are evaluated over 20 random data splits.
  • Figure 3: Noisy response experiment: CIFAR-10N dataset. Average coverage and set size obtained by various methods; see text for details. The metrics are evaluated over 20 random data splits.
  • Figure 4: IHDP dataset experiment. The coverage rate and average interval length achieved by an uncalibrated quantile regression (Uncalibrated), a naive jackknife+ (Naive CP), JAW (Weighted CP) which estimates the corruption probability from either $X$ (orange), $Z$ (green), or uses the oracle probabilities (red), and the proposed method (Privileged CP) with the three options for the corruption probabilities. All methods are applied to attain a coverage rate at level $1 - 2\alpha = 90\%$. The metrics are evaluated over 50 random data splits.
  • Figure 5: Twins dataset experiment. The coverage rate and average set size achieved by naive conformal prediction (Naive CP), Weighted CP which estimates the corruption probability from either $X$ (orange), $Z$ (green), or uses the oracle probabilities (red), the baseline Two Staged CP, and the proposed method (Privileged CP) with the three options for the corruption probabilities. All methods are applied to attain a coverage rate at level $1 - \alpha = 90\%$. The metrics are evaluated over 20 random data splits.
  • ...and 8 more figures

Theorems & Definitions (15)

  • Example 1: Noisy response
  • Example 2: Missing features
  • Example 3: Missing response
  • Proposition 1
  • Theorem 1
  • Theorem 2
  • proof
  • proof
  • proof
  • Lemma 1
  • ...and 5 more