Table of Contents
Fetching ...

Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization

Runqi Lin, Chaojian Yu, Tongliang Liu

TL;DR

A novel method is designed, termed Abnormal Adversarial Examples Regularization (AAER), which explicitly regularizes the variation of AAEs to hinder the classifier from becoming distorted, and can effectively eliminate CO and further boost adversarial robustness with negligible additional computational overhead.

Abstract

Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. However, SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier, making it vulnerable to multi-step adversarial attacks. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour, that is, although these training samples are generated by the inner maximization process, their associated loss decreases instead, which we named abnormal adversarial examples (AAEs). Upon further analysis, we discover a close relationship between AAEs and classifier distortion, as both the number and outputs of AAEs undergo a significant variation with the onset of CO. Given this observation, we re-examine the SSAT process and uncover that before the occurrence of CO, the classifier already displayed a slight distortion, indicated by the presence of few AAEs. Furthermore, the classifier directly optimizing these AAEs will accelerate its distortion, and correspondingly, the variation of AAEs will sharply increase as a result. In such a vicious circle, the classifier rapidly becomes highly distorted and manifests as CO within a few iterations. These observations motivate us to eliminate CO by hindering the generation of AAEs. Specifically, we design a novel method, termed Abnormal Adversarial Examples Regularization (AAER), which explicitly regularizes the variation of AAEs to hinder the classifier from becoming distorted. Extensive experiments demonstrate that our method can effectively eliminate CO and further boost adversarial robustness with negligible additional computational overhead.

Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization

TL;DR

A novel method is designed, termed Abnormal Adversarial Examples Regularization (AAER), which explicitly regularizes the variation of AAEs to hinder the classifier from becoming distorted, and can effectively eliminate CO and further boost adversarial robustness with negligible additional computational overhead.

Abstract

Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. However, SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier, making it vulnerable to multi-step adversarial attacks. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour, that is, although these training samples are generated by the inner maximization process, their associated loss decreases instead, which we named abnormal adversarial examples (AAEs). Upon further analysis, we discover a close relationship between AAEs and classifier distortion, as both the number and outputs of AAEs undergo a significant variation with the onset of CO. Given this observation, we re-examine the SSAT process and uncover that before the occurrence of CO, the classifier already displayed a slight distortion, indicated by the presence of few AAEs. Furthermore, the classifier directly optimizing these AAEs will accelerate its distortion, and correspondingly, the variation of AAEs will sharply increase as a result. In such a vicious circle, the classifier rapidly becomes highly distorted and manifests as CO within a few iterations. These observations motivate us to eliminate CO by hindering the generation of AAEs. Specifically, we design a novel method, termed Abnormal Adversarial Examples Regularization (AAER), which explicitly regularizes the variation of AAEs to hinder the classifier from becoming distorted. Extensive experiments demonstrate that our method can effectively eliminate CO and further boost adversarial robustness with negligible additional computational overhead.
Paper Structure (21 sections, 9 equations, 7 figures, 17 tables, 1 algorithm)

This paper contains 21 sections, 9 equations, 7 figures, 17 tables, 1 algorithm.

Figures (7)

  • Figure 1: The test accuracy of RS-FGSM wong2020fast (red line) and RS-AAER (green line) with 16/255 noise magnitude. The dashed and solid lines denote natural and robust (PGD-7-1) accuracy, respectively. The dashed black line corresponds to the 9th epoch, which is the point that RS-FGSM occurs CO.
  • Figure 2: A conceptual diagram of the classifier’s decision boundary and training samples. The training samples belonging to NAE (blue) can effectively mislead the classifier, while AAE (red) cannot. The left panel shows the decision boundary before optimizing AAEs, which only has a slight distortion. The middle panel shows the decision boundary after optimizing AAEs, which exacerbates the distortion and generates more AAEs.
  • Figure 3: The number, the variation of prediction confidence and logits distribution (from left to right) for NAEs, AAEs and training samples in RS-FGSM with 16/255 noise magnitude. The dashed black line corresponds to the 9th epoch, which is the point that the model occurs CO.
  • Figure 4: Left/Middle panel: The visualization of AAEs/NAEs loss surface before CO (8th epoch). Right panel: The number of AAEs and the test robustness within each iteration at CO (9th epoch). The green and red lines represent the robust accuracy and number of AAEs, respectively.
  • Figure 5: The number, the variation of prediction confidence and logits distribution (from left to right) for NAEs, AAEs and training samples in RS-AAER with 16/255 noise magnitude.
  • ...and 2 more figures