Coverage Guarantees for Pseudo-Calibrated Conformal Prediction under Distribution Shift
Farbod Siahkali, Ashwin Verma, Vijay Gupta
TL;DR
This paper addresses the breakdown of conformal prediction guarantees under distribution shift when target labels are unavailable. It leverages domain-adaptation insights to bound target coverage in terms of the source classifier loss and a Wasserstein shift, deriving a concrete lower bound that includes $L_r(f,P)$ and a shift term, and introduces relaxed pseudo-calibrated sets with a slack parameter to guarantee prescribed target coverage. A novel source-tuned pseudo-calibration algorithm is proposed, interpolating between hard pseudo-labels and randomized labels based on an uncertainty measure to reduce conservatism while preserving coverage. Numerical experiments on MNIST and CIFAR datasets show that the bounds qualitatively track pseudo-calibration behavior and that the proposed method mitigates coverage degradation under distribution shift with reasonable prediction-set sizes. Overall, the work provides theory-backed, practical tools for reliable multiclass prediction under shift in the absence of target labels.
Abstract
Conformal prediction (CP) offers distribution-free marginal coverage guarantees under an exchangeability assumption, but these guarantees can fail if the data distribution shifts. We analyze the use of pseudo-calibration as a tool to counter this performance loss under a bounded label-conditional covariate shift model. Using tools from domain adaptation, we derive a lower bound on target coverage in terms of the source-domain loss of the classifier and a Wasserstein measure of the shift. Using this result, we provide a method to design pseudo-calibrated sets that inflate the conformal threshold by a slack parameter to keep target coverage above a prescribed level. Finally, we propose a source-tuned pseudo-calibration algorithm that interpolates between hard pseudo-labels and randomized labels as a function of classifier uncertainty. Numerical experiments show that our bounds qualitatively track pseudo-calibration behavior and that the source-tuned scheme mitigates coverage degradation under distribution shift while maintaining nontrivial prediction set sizes.
