Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks
Xunlei Qian, Yue Xing
TL;DR
The paper tackles how split conformal prediction behaves under adversarial perturbations, addressing both validity and efficiency when distribution shift occurs. It develops a theoretical framework showing that calibration-time perturbations influence test-time coverage, proves the existence of tolerance bands across a range of test attacks, and demonstrates that adversarial training reduces conformal set sizes while preserving coverage. Empirically, it validates the theory on CIFAR/MNIST/TinyImageNet with ResNet and ViT models, showing monotone coverage with calibration strength, robust bands under attack, and smaller sets under adversarial training. The results offer practical guidance for deploying conformal prediction in adversarial and high-stakes settings, balancing robustness and informativeness.
Abstract
Conformal prediction (CP) provides distribution-free, finite-sample coverage guarantees but critically relies on exchangeability, a condition often violated under distribution shift. We study the robustness of split conformal prediction under adversarial perturbations at test time, focusing on both coverage validity and the resulting prediction set size. Our theoretical analysis characterizes how the strength of adversarial perturbations during calibration affects coverage guarantees under adversarial test conditions. We further examine the impact of adversarial training at the model-training stage. Extensive experiments support our theory: (i) Prediction coverage varies monotonically with the calibration-time attack strength, enabling the use of nonzero calibration-time attack to predictably control coverage under adversarial tests; (ii) target coverage can hold over a range of test-time attacks: with a suitable calibration attack, coverage stays within any chosen tolerance band across a contiguous set of perturbation levels; and (iii) adversarial training at the training stage produces tighter prediction sets that retain high informativeness.
