Table of Contents
Fetching ...

Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

Linyu Tang, Lei Zhang

TL;DR

The Test-Time Pixel-Level Adversarial Purification (TPAP) method is proposed, which utilizes FGSM for adversarial purification, to process images for purifying unknown adversarial perturbations from pixels at testing time in a "counter changes with changelessness" manner, thereby enhancing the defense capability of DNNs against various unknown adversarial attacks.

Abstract

Numerous studies have demonstrated the susceptibility of deep neural networks (DNNs) to subtle adversarial perturbations, prompting the development of many advanced adversarial defense methods aimed at mitigating adversarial attacks. Current defense strategies usually train DNNs for a specific adversarial attack method and can achieve good robustness in defense against this type of adversarial attack. Nevertheless, when subjected to evaluations involving unfamiliar attack modalities, empirical evidence reveals a pronounced deterioration in the robustness of DNNs. Meanwhile, there is a trade-off between the classification accuracy of clean examples and adversarial examples. Most defense methods often sacrifice the accuracy of clean examples in order to improve the adversarial robustness of DNNs. To alleviate these problems and enhance the overall robust generalization of DNNs, we propose the Test-Time Pixel-Level Adversarial Purification (TPAP) method. This approach is based on the robust overfitting characteristic of DNNs to the fast gradient sign method (FGSM) on training and test datasets. It utilizes FGSM for adversarial purification, to process images for purifying unknown adversarial perturbations from pixels at testing time in a "counter changes with changelessness" manner, thereby enhancing the defense capability of DNNs against various unknown adversarial attacks. Extensive experimental results show that our method can effectively improve both overall robust generalization of DNNs, notably over previous methods.

Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

TL;DR

The Test-Time Pixel-Level Adversarial Purification (TPAP) method is proposed, which utilizes FGSM for adversarial purification, to process images for purifying unknown adversarial perturbations from pixels at testing time in a "counter changes with changelessness" manner, thereby enhancing the defense capability of DNNs against various unknown adversarial attacks.

Abstract

Numerous studies have demonstrated the susceptibility of deep neural networks (DNNs) to subtle adversarial perturbations, prompting the development of many advanced adversarial defense methods aimed at mitigating adversarial attacks. Current defense strategies usually train DNNs for a specific adversarial attack method and can achieve good robustness in defense against this type of adversarial attack. Nevertheless, when subjected to evaluations involving unfamiliar attack modalities, empirical evidence reveals a pronounced deterioration in the robustness of DNNs. Meanwhile, there is a trade-off between the classification accuracy of clean examples and adversarial examples. Most defense methods often sacrifice the accuracy of clean examples in order to improve the adversarial robustness of DNNs. To alleviate these problems and enhance the overall robust generalization of DNNs, we propose the Test-Time Pixel-Level Adversarial Purification (TPAP) method. This approach is based on the robust overfitting characteristic of DNNs to the fast gradient sign method (FGSM) on training and test datasets. It utilizes FGSM for adversarial purification, to process images for purifying unknown adversarial perturbations from pixels at testing time in a "counter changes with changelessness" manner, thereby enhancing the defense capability of DNNs against various unknown adversarial attacks. Extensive experimental results show that our method can effectively improve both overall robust generalization of DNNs, notably over previous methods.
Paper Structure (16 sections, 9 equations, 7 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) and (b) represents the training set results of clean and adversarial examples through adversarial training with FGSM, CW$_2$, PGD and STA attacks, respectively, on the ResNet18 using the training set of CIFAR-10, and (c) and (d) show the results of cifar10 test set. The horizontal axis represents epochs, while the vertical axis represents classification accuracy.
  • Figure 2: Overview of the training phase and testing phase inference phase. Arrows indicate data flow, and double straight arrows indicate testing phase pre-classification.
  • Figure 3: (a) and (b) represent the processing of FGSM robust overfitting and other adversarial training methods for DNNs in the test purification phase, respectively. The black curves indicate the categorization boundaries, and the triangles, circles, and squares indicate the 3 different categories, respectively.
  • Figure 4: (a) and (b) respectively represent the FGSM adversarial training process on the ResNet18 using CIFAR-10 and Tiny-ImageNet datasets. The horizontal axis represents epochs and the vertical axis represents classification accuracy. Solid lines represent the training set and dashed lines represent the test set. The orange, blue and green lines indicate the classification accuracy of the clean, FGSM and PGD adversarial examples, respectively.
  • Figure 5: (a) and (b) represent ablation study of robust training in TPAP under various perturbation strengths and batch sizes.
  • ...and 2 more figures