Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Jiacheng Zhang; Feng Liu; Dawei Zhou; Jingfeng Zhang; Tongliang Liu

Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Jiacheng Zhang, Feng Liu, Dawei Zhou, Jingfeng Zhang, Tongliang Liu

TL;DR

This work addresses the accuracy-robustness trade-off in adversarial training by showing that pixel regions contribute unequally to robustness and accuracy. It introduces PART, a CAM-guided, pixel-based reweighting framework that assigns full perturbation budget to important regions and reduced budgets to less influential ones via Pixel-AG. Empirically, PART improves natural accuracy with minimal robustness loss on CIFAR-10, SVHN, and TinyImagenet-200 and remains compatible with standard AT methods and CAM variants, while also resisting adaptive attacks and improving corruption robustness. The approach offers a practical enhancement to robust classification and suggests broader applicability to future architectures and optimization of per-pixel perturbation budgets.

Abstract

Adversarial training (AT) trains models using adversarial examples (AEs), which are natural images modified with specific perturbations to mislead the model. These perturbations are constrained by a predefined perturbation budget $ε$ and are equally applied to each pixel within an image. However, in this paper, we discover that not all pixels contribute equally to the accuracy on AEs (i.e., robustness) and accuracy on natural images (i.e., accuracy). Motivated by this finding, we propose Pixel-reweighted AdveRsarial Training (PART), a new framework that partially reduces $ε$ for less influential pixels, guiding the model to focus more on key regions that affect its outputs. Specifically, we first use class activation mapping (CAM) methods to identify important pixel regions, then we keep the perturbation budget for these regions while lowering it for the remaining regions when generating AEs. In the end, we use these pixel-reweighted AEs to train a model. PART achieves a notable improvement in accuracy without compromising robustness on CIFAR-10, SVHN and TinyImagenet-200, justifying the necessity to allocate distinct weights to different pixel regions in robust classification.

Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

TL;DR

Abstract

and are equally applied to each pixel within an image. However, in this paper, we discover that not all pixels contribute equally to the accuracy on AEs (i.e., robustness) and accuracy on natural images (i.e., accuracy). Motivated by this finding, we propose Pixel-reweighted AdveRsarial Training (PART), a new framework that partially reduces

for less influential pixels, guiding the model to focus more on key regions that affect its outputs. Specifically, we first use class activation mapping (CAM) methods to identify important pixel regions, then we keep the perturbation budget for these regions while lowering it for the remaining regions when generating AEs. In the end, we use these pixel-reweighted AEs to train a model. PART achieves a notable improvement in accuracy without compromising robustness on CIFAR-10, SVHN and TinyImagenet-200, justifying the necessity to allocate distinct weights to different pixel regions in robust classification.

Paper Structure (31 sections, 2 theorems, 28 equations, 7 figures, 9 tables, 3 algorithms)

This paper contains 31 sections, 2 theorems, 28 equations, 7 figures, 9 tables, 3 algorithms.

Introduction
Preliminaries
Pixel-reweighted Adversarial Training
Learning Objective of PART
Realization of PART
How $\epsilon$ Affect the Generation of AEs
Comparisons with Related Work
Experiments
Experiment Settings
Performance Evaluation and Analysis
Ablation Studies
Scalability and Applicability
Training Speed and Memory Consumption
Conclusion
Perturbations with $\ell_2$-norm Constraint
...and 16 more sections

Key Result

Lemma 3.1

Let $\delta_1^*$ and $\delta_2^*$ be the optimal solutions of Eq. eq: toy. The generated AEs can be categorized into three cases: (i) The expressions of $\delta_1^*$ and $\delta_2^*$ do not contain $\epsilon_1$ and $\epsilon_2$. (ii) $\delta_1^* = \pm \epsilon_1$ and $\delta_2^* = \pm \epsilon_2$. (

Figures (7)

Figure 1: The proof-of-concept experiment. We find that fundamental discrepancies exist among different pixel regions. Specifically, we segment each image into four equal-sized regions (i.e., ul, short for upper left; ur, short for upper right; br, short for bottom right; bl, short for bottom left) and adversarially train two ResNet-18 he2015deep on CIFAR-10 cifar using AT Madry2018 with the same experiment settings except for the allocation of $\epsilon$. The robustness is evaluated by $\ell_{\infty}$-norm PGD-20 Madry2018. With the same overall perturbation budgets (i.e., allocate one of the regions to $6/255$ and others to $12/255$), we find that both natural accuracy and adversarial robustness change significantly if the regional allocation on $\epsilon$ is different. For example, by changing $\epsilon_{\rm{br}} = 6/255$ to $\epsilon_{\rm{ul}} = 6/255$, accuracy gains a 1.23% improvement and robustness gains a 0.94% improvement.
Figure 2: AT-based classifiers (the first row) vs. PART-based classifiers (the second row). The heatmaps are visualized by GradCAM GradCAM. In these heatmaps, a shift towards deeper red signifies a greater contribution to classification. This gradation in hue visually emphasizes the most influential pixel regions to the classification results. We find that PART-based methods could indeed be guided more towards leveraging semantic information in images (e.g., the horse) to make classification decisions.
Figure 3: An overview of the training procedure for PART. Compared to AT, PART leverages the power of CAM methods to identify important pixel regions. Based on the class activation map, we element-wisely multiply a mask to the perturbation to keep the perturbation budget $\epsilon$ for important pixel regions while shrinking it to $\epsilon^{\rm low}$ for their counterparts during the generation process of AEs.
Figure 4: Impact of $\epsilon^{\rm low}$ on robustness and accuracy of PART. Left: $\epsilon = 12/255$ and $\epsilon^{\rm low} \in \{11/255$, $10/255$, $9/255$, $8/255\}$. Right: $\epsilon = 8/255$, and $\epsilon^{\rm low} \in \{7/255$, $6/255$, $5/255$, $4/255\}$. Solid lines represent the performance of PART ($s$ = 1), and dashed lines represent the performance of AT. We report the averaged results and standard deviations (i.e., shaded areas) of three runs.
Figure 5: Qualitative results of how attention heatmaps change with epoch number $\in \{30, 40, 50, 60\}$ on CIFAR-10.
...and 2 more figures

Theorems & Definitions (3)

Lemma 3.1
Theorem 3.2
proof

Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

TL;DR

Abstract

Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (3)