Table of Contents
Fetching ...

Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency

Runqi Lin, Chaojian Yu, Bo Han, Hang Su, Tongliang Liu

TL;DR

This work investigates catastrophic overfitting in single-step adversarial training and reveals that CO arises from layer-wise distortion, with former layers showing earlier, greater changes driven by pseudo-robust shortcuts. It introduces Layer-Aware Adversarial Weight Perturbation (LAP), an adaptive, layer-wise perturbation strategy that prioritizes weight perturbations and applies a decreasing perturbation strength across layers, enabling efficient prevention of CO while boosting robustness. The authors provide a PAC-Bayes bound to theoretically justify the approach and demonstrate, through extensive experiments on CIFAR-10/100 and Tiny-ImageNet across CNNs and Vision Transformers, that LAP prevents CO and yields stronger adversarial robustness with modest training overhead. Overall, LAP offers a practical, scalable remedy for CO in single-step AT, with broad applicability across architectures and datasets, enhancing robustness without the heavy cost of multi-step methods.

Abstract

Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT), manifesting as highly distorted deep neural networks (DNNs) that are vulnerable to multi-step adversarial attacks. However, the underlying factors that lead to the distortion of decision boundaries remain unclear. In this work, we delve into the specific changes within different DNN layers and discover that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity. Our analysis further reveals that this increased sensitivity in former layers stems from the formation of pseudo-robust shortcuts, which alone can impeccably defend against single-step adversarial attacks but bypass genuine-robust learning, resulting in distorted decision boundaries. Eliminating these shortcuts can partially restore robustness in DNNs from the CO state, thereby verifying that dependence on them triggers the occurrence of CO. This understanding motivates us to implement adaptive weight perturbations across different layers to hinder the generation of pseudo-robust shortcuts, consequently mitigating CO. Extensive experiments demonstrate that our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.

Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency

TL;DR

This work investigates catastrophic overfitting in single-step adversarial training and reveals that CO arises from layer-wise distortion, with former layers showing earlier, greater changes driven by pseudo-robust shortcuts. It introduces Layer-Aware Adversarial Weight Perturbation (LAP), an adaptive, layer-wise perturbation strategy that prioritizes weight perturbations and applies a decreasing perturbation strength across layers, enabling efficient prevention of CO while boosting robustness. The authors provide a PAC-Bayes bound to theoretically justify the approach and demonstrate, through extensive experiments on CIFAR-10/100 and Tiny-ImageNet across CNNs and Vision Transformers, that LAP prevents CO and yields stronger adversarial robustness with modest training overhead. Overall, LAP offers a practical, scalable remedy for CO in single-step AT, with broad applicability across architectures and datasets, enhancing robustness without the heavy cost of multi-step methods.

Abstract

Catastrophic overfitting (CO) presents a significant challenge in single-step adversarial training (AT), manifesting as highly distorted deep neural networks (DNNs) that are vulnerable to multi-step adversarial attacks. However, the underlying factors that lead to the distortion of decision boundaries remain unclear. In this work, we delve into the specific changes within different DNN layers and discover that during CO, the former layers are more susceptible, experiencing earlier and greater distortion, while the latter layers show relative insensitivity. Our analysis further reveals that this increased sensitivity in former layers stems from the formation of pseudo-robust shortcuts, which alone can impeccably defend against single-step adversarial attacks but bypass genuine-robust learning, resulting in distorted decision boundaries. Eliminating these shortcuts can partially restore robustness in DNNs from the CO state, thereby verifying that dependence on them triggers the occurrence of CO. This understanding motivates us to implement adaptive weight perturbations across different layers to hinder the generation of pseudo-robust shortcuts, consequently mitigating CO. Extensive experiments demonstrate that our proposed method, Layer-Aware Adversarial Weight Perturbation (LAP), can effectively prevent CO and further enhance robustness.
Paper Structure (21 sections, 10 equations, 6 figures, 10 tables, 1 algorithm)

This paper contains 21 sections, 10 equations, 6 figures, 10 tables, 1 algorithm.

Figures (6)

  • Figure 1: The test accuracy of R-FGSM and R-LAP under 16/255 noise magnitude, where the solid and dashed lines denote natural and robust (PGD) accuracy, respectively.
  • Figure 2: Visualization of the loss landscape for individual layers (1st to 5th columns) and for the whole model (6th column). The upper, middle, and lower rows correspond to the stages before, during, and after CO, respectively.
  • Figure 3: Singular value of weights (convolution kernel) at different DNN layers. The blue, green, and red lines represent the model state before, during, and after CO, respectively.
  • Figure 4: Evaluating the test accuracy of a CO-affected model against single-step (FGSM) and multi-step (PGD) adversarial attack.
  • Figure 5: Visualization of the loss landscape for individual layers (1st to 5th columns) and for the whole model (6th column).
  • ...and 1 more figures