Table of Contents
Fetching ...

The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

Ziquan Liu, Yufei Cui, Yan Yan, Yi Xu, Xiangyang Ji, Xue Liu, Antoni B. Chan

TL;DR

This work addresses the challenge of obtaining reliable uncertainty quantification for adversarially robust models by evaluating conformal prediction (CP) under standard adversarial attacks and exposing its inefficiency when using non-robust or even some AT models. It introduces Uncertainty-Reducing AT (AT-UR), combining entropy minimization (to lower predictive entropy) and Beta-weighted sampling (to focus training on promising TCPR regions), and proves a population-level bound linking the Beta-weighted loss to CP prediction-set size. The approach is validated on four image datasets across three common AT baselines (AT, FAT, TRADES), showing substantial CP-efficiency gains (reduced PSS) while preserving coverage, albeit with some trade-offs in Top-1 accuracy. The results contribute a practical framework for CP-aware adversarial training, with implications for safety-critical deployments where reliable uncertainty is crucial and attack-prone settings prevail.

Abstract

In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness through various forms of adversarial training (AT), a notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. To address this gap, this study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks within the adversarial defense community. It is first unveiled that existing CP methods do not produce informative prediction sets under the commonly used $l_{\infty}$-norm bounded attack if the model is not adversarially trained, which underpins the importance of adversarial training for CP. Our paper next demonstrates that the prediction set size (PSS) of CP using adversarially trained models with AT variants is often worse than using standard AT, inspiring us to research into CP-efficient AT for improved PSS. We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis. Moreover, our empirical study on four image classification datasets across three popular AT baselines validates the effectiveness of the proposed Uncertainty-Reducing AT (AT-UR).

The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

TL;DR

This work addresses the challenge of obtaining reliable uncertainty quantification for adversarially robust models by evaluating conformal prediction (CP) under standard adversarial attacks and exposing its inefficiency when using non-robust or even some AT models. It introduces Uncertainty-Reducing AT (AT-UR), combining entropy minimization (to lower predictive entropy) and Beta-weighted sampling (to focus training on promising TCPR regions), and proves a population-level bound linking the Beta-weighted loss to CP prediction-set size. The approach is validated on four image datasets across three common AT baselines (AT, FAT, TRADES), showing substantial CP-efficiency gains (reduced PSS) while preserving coverage, albeit with some trade-offs in Top-1 accuracy. The results contribute a practical framework for CP-aware adversarial training, with implications for safety-critical deployments where reliable uncertainty is crucial and attack-prone settings prevail.

Abstract

In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness through various forms of adversarial training (AT), a notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. To address this gap, this study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks within the adversarial defense community. It is first unveiled that existing CP methods do not produce informative prediction sets under the commonly used -norm bounded attack if the model is not adversarially trained, which underpins the importance of adversarial training for CP. Our paper next demonstrates that the prediction set size (PSS) of CP using adversarially trained models with AT variants is often worse than using standard AT, inspiring us to research into CP-efficient AT for improved PSS. We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis. Moreover, our empirical study on four image classification datasets across three popular AT baselines validates the effectiveness of the proposed Uncertainty-Reducing AT (AT-UR).
Paper Structure (24 sections, 5 theorems, 30 equations, 10 figures, 9 tables)

This paper contains 24 sections, 5 theorems, 30 equations, 10 figures, 9 tables.

Key Result

Theorem 5.1

(Learning bound for the expected size of CP prediction sets) Let $L_\text{Beta}(f) := \sum_{k=1}^K \sigma_k \cdot \mathbb E[ \ell(f(X), Y) | r_f(X,Y) = k ]$, where $\sigma_k \sim p_\text{Beta}(k/(K+1); a, b)$ with $a=1.1, b=5$. We have the following inequality where $| \mathcal{C}_f(X) |$ is the cardinality of the prediction set $\mathcal{C}_f(X)$ for a classifier $f$ with input $X$ and $r_f(X,Y)

Figures (10)

  • Figure 1: The proposed uncertainty-reducing adversarial training (AT-UR) improves the CP-efficiency of existing adversarial training methods like AT, FAT and TRADES. (1) AT improves the Top-1 robust accuracy of a standard model; (2) CP generates a prediction set with a pre-specified coverage guarantee for an input image, but for models not adversarially trained, CP fails to generate informative prediction sets, as the PSS is almost the same as the class number, when models being attacked (Fig. \ref{['fig:pitfalls_std_model']}); (3) When using CP in an adversarially trained model, the prediction set size is generally large, leading to inefficient CP. Our AT-UR substantially improves the CP-efficiency of existing AT methods.
  • Figure 2: The performance of three representative CP methods using non-robust models under standard adversarial attacks in the adversarial defense community. The red line denotes means standard deviation of the metric. For comparison, the average PSS for normal images is 1.03 and 2.39 for CIFAR10 and CIFAR 100. See Sec. \ref{['sec:experiment:setting']} for details of the experiment.
  • Figure 3: (Left): The kernel density estimation for predictive distribution's entropy on adversarial test sets. (Right): Box plot of PSS of three AT baselines and AT-EM. AT-EM effectively controls prediction entropy and improves CP-efficiency. See Tab. \ref{['tab:popular_AT']} and Tab. \ref{['tab:main']} for their coverages.
  • Figure 4: The Beta distribution density function $\tilde{p}_{\text{Beta}}$ used in our experiment. This weighting scheme increases the importance of samples in the promising region.
  • Figure 5: The CP curve of coverage versus PSS. Each point on the curve is obtained by adjusting the threshold $\hat{\tau}_{\text{cal}}$. We plot 15 CP curves (opaque line) and their average (solid line) for each method. The red vertical line indicates the operating point for 90% coverage. We visualize the curve with the appropriate y-scale so that the difference is better visualized.
  • ...and 5 more figures

Theorems & Definitions (9)

  • Theorem 5.1
  • Theorem 4.1
  • proof
  • Lemma 4.2
  • Lemma 4.3
  • proof
  • proof
  • Lemma 4.4
  • proof