Table of Contents
Fetching ...

Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks

Xunlei Qian, Yue Xing

TL;DR

The paper tackles how split conformal prediction behaves under adversarial perturbations, addressing both validity and efficiency when distribution shift occurs. It develops a theoretical framework showing that calibration-time perturbations influence test-time coverage, proves the existence of tolerance bands across a range of test attacks, and demonstrates that adversarial training reduces conformal set sizes while preserving coverage. Empirically, it validates the theory on CIFAR/MNIST/TinyImageNet with ResNet and ViT models, showing monotone coverage with calibration strength, robust bands under attack, and smaller sets under adversarial training. The results offer practical guidance for deploying conformal prediction in adversarial and high-stakes settings, balancing robustness and informativeness.

Abstract

Conformal prediction (CP) provides distribution-free, finite-sample coverage guarantees but critically relies on exchangeability, a condition often violated under distribution shift. We study the robustness of split conformal prediction under adversarial perturbations at test time, focusing on both coverage validity and the resulting prediction set size. Our theoretical analysis characterizes how the strength of adversarial perturbations during calibration affects coverage guarantees under adversarial test conditions. We further examine the impact of adversarial training at the model-training stage. Extensive experiments support our theory: (i) Prediction coverage varies monotonically with the calibration-time attack strength, enabling the use of nonzero calibration-time attack to predictably control coverage under adversarial tests; (ii) target coverage can hold over a range of test-time attacks: with a suitable calibration attack, coverage stays within any chosen tolerance band across a contiguous set of perturbation levels; and (iii) adversarial training at the training stage produces tighter prediction sets that retain high informativeness.

Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks

TL;DR

The paper tackles how split conformal prediction behaves under adversarial perturbations, addressing both validity and efficiency when distribution shift occurs. It develops a theoretical framework showing that calibration-time perturbations influence test-time coverage, proves the existence of tolerance bands across a range of test attacks, and demonstrates that adversarial training reduces conformal set sizes while preserving coverage. Empirically, it validates the theory on CIFAR/MNIST/TinyImageNet with ResNet and ViT models, showing monotone coverage with calibration strength, robust bands under attack, and smaller sets under adversarial training. The results offer practical guidance for deploying conformal prediction in adversarial and high-stakes settings, balancing robustness and informativeness.

Abstract

Conformal prediction (CP) provides distribution-free, finite-sample coverage guarantees but critically relies on exchangeability, a condition often violated under distribution shift. We study the robustness of split conformal prediction under adversarial perturbations at test time, focusing on both coverage validity and the resulting prediction set size. Our theoretical analysis characterizes how the strength of adversarial perturbations during calibration affects coverage guarantees under adversarial test conditions. We further examine the impact of adversarial training at the model-training stage. Extensive experiments support our theory: (i) Prediction coverage varies monotonically with the calibration-time attack strength, enabling the use of nonzero calibration-time attack to predictably control coverage under adversarial tests; (ii) target coverage can hold over a range of test-time attacks: with a suitable calibration attack, coverage stays within any chosen tolerance band across a contiguous set of perturbation levels; and (iii) adversarial training at the training stage produces tighter prediction sets that retain high informativeness.

Paper Structure

This paper contains 27 sections, 5 theorems, 37 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Under Assumption assumption, if the sample data $\{(x_{i},y_{i})\}$ for $i\in 1,2,\cdots,n+1$ are exchangeable, then the prediction set $C(\cdot)$ given by Split CP satisfies:

Figures (4)

  • Figure 1: ViT Accuracy
  • Figure 2: ViT Set Size
  • Figure 3: Resnet50d Accuracy
  • Figure 4: Resnet50d Set Size

Theorems & Definitions (8)

  • Remark 1
  • Proposition 1
  • Theorem 1
  • Theorem 2
  • Example 1
  • Theorem 3
  • Lemma 1
  • Proof 1