Generalization and Informativeness of Weighted Conformal Risk Control Under Covariate Shift
Matteo Zecchin, Fredrik Hellström, Sangwoo Park, Shlomo Shamai, Osvaldo Simeone
TL;DR
This work analyzes the generalization and efficiency of Weighted Conformal Risk Control (W-CRC) under covariate shift, linking the average size of predictive sets to the base predictor's generalization gap, the covariate-shift magnitude, and data-splitting hyperparameters. A novel information-theoretic bound is derived, showing how calibration size, test-time likelihood ratios, and training-calibration trade-offs influence set informativeness. The results provide practical guidance for allocating data between training and calibration, especially under larger shifts, and are validated on fingerprinting-based localization with RSSI features. Overall, the paper advances understanding of when and how W-CRC remains informative while maintaining reliability under distribution changes.
Abstract
Predictive models are often required to produce reliable predictions under statistical conditions that are not matched to the training data. A common type of training-testing mismatch is covariate shift, where the conditional distribution of the target variable given the input features remains fixed, while the marginal distribution of the inputs changes. Weighted conformal risk control (W-CRC) uses data collected during the training phase to convert point predictions into prediction sets with valid risk guarantees at test time despite the presence of a covariate shift. However, while W-CRC provides statistical reliability, its efficiency -- measured by the size of the prediction sets -- can only be assessed at test time. In this work, we relate the generalization properties of the base predictor to the efficiency of W-CRC under covariate shifts. Specifically, we derive a bound on the inefficiency of the W-CRC predictor that depends on algorithmic hyperparameters and task-specific quantities available at training time. This bound offers insights on relationships between the informativeness of the prediction sets, the extent of the covariate shift, and the size of the calibration and training sets. Experiments on fingerprinting-based localization validate the theoretical results.
