Table of Contents
Fetching ...

Adversarial Training on Purification (AToP): Advancing Both Robustness and Generalization

Guang Lin, Chao Li, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao

TL;DR

This work tackles the trade-off between robustness to known adversarial attacks and generalization to unseen attacks, proposing Adversarial Training on Purification (AToP) that combines perturbation-destructive random transforms with adversarially fine-tuned purifiers. By freezing the classifier and optimizing the purifier with an adversarial loss, AToP learns a robust purification process capable of recovering clean semantics from corrupted inputs across diverse attacks. Empirical results on CIFAR-10, CIFAR-100, and ImageNette show state-of-the-art robustness and improved generalization to unseen attacks for both GAN-based and AE-based purifiers, albeit with higher training cost due to purifier fine-tuning. The approach highlights the potential of integrating AT and AP to achieve robust purification, while motivating future work on making purifier training more efficient and scalable.

Abstract

The deep neural networks are known to be vulnerable to well-designed adversarial attacks. The most successful defense technique based on adversarial training (AT) can achieve optimal robustness against particular attacks but cannot generalize well to unseen attacks. Another effective defense technique based on adversarial purification (AP) can enhance generalization but cannot achieve optimal robustness. Meanwhile, both methods share one common limitation on the degraded standard accuracy. To mitigate these issues, we propose a novel pipeline to acquire the robust purifier model, named Adversarial Training on Purification (AToP), which comprises two components: perturbation destruction by random transforms (RT) and purifier model fine-tuned (FT) by adversarial loss. RT is essential to avoid overlearning to known attacks, resulting in the robustness generalization to unseen attacks, and FT is essential for the improvement of robustness. To evaluate our method in an efficient and scalable way, we conduct extensive experiments on CIFAR-10, CIFAR-100, and ImageNette to demonstrate that our method achieves optimal robustness and exhibits generalization ability against unseen attacks.

Adversarial Training on Purification (AToP): Advancing Both Robustness and Generalization

TL;DR

This work tackles the trade-off between robustness to known adversarial attacks and generalization to unseen attacks, proposing Adversarial Training on Purification (AToP) that combines perturbation-destructive random transforms with adversarially fine-tuned purifiers. By freezing the classifier and optimizing the purifier with an adversarial loss, AToP learns a robust purification process capable of recovering clean semantics from corrupted inputs across diverse attacks. Empirical results on CIFAR-10, CIFAR-100, and ImageNette show state-of-the-art robustness and improved generalization to unseen attacks for both GAN-based and AE-based purifiers, albeit with higher training cost due to purifier fine-tuning. The approach highlights the potential of integrating AT and AP to achieve robust purification, while motivating future work on making purifier training more efficient and scalable.

Abstract

The deep neural networks are known to be vulnerable to well-designed adversarial attacks. The most successful defense technique based on adversarial training (AT) can achieve optimal robustness against particular attacks but cannot generalize well to unseen attacks. Another effective defense technique based on adversarial purification (AP) can enhance generalization but cannot achieve optimal robustness. Meanwhile, both methods share one common limitation on the degraded standard accuracy. To mitigate these issues, we propose a novel pipeline to acquire the robust purifier model, named Adversarial Training on Purification (AToP), which comprises two components: perturbation destruction by random transforms (RT) and purifier model fine-tuned (FT) by adversarial loss. RT is essential to avoid overlearning to known attacks, resulting in the robustness generalization to unseen attacks, and FT is essential for the improvement of robustness. To evaluate our method in an efficient and scalable way, we conduct extensive experiments on CIFAR-10, CIFAR-100, and ImageNette to demonstrate that our method achieves optimal robustness and exhibits generalization ability against unseen attacks.
Paper Structure (11 sections, 9 equations, 9 figures, 9 tables, 1 algorithm)

This paper contains 11 sections, 9 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of adversarial training on purification (AToP).
  • Figure 2: Illustration of random transforms. (a) Firstly, adding Gaussian noise to preliminarily corrupt the image, then randomly masking the image. (b) Next, based on (a), the noisy image is covered randomly by $N$ non-overlapping masks. Finally, the completed pixels are combined to reconstruct the image denoted as $\hat{x}$.
  • Figure 3: The purified images are obtained for clean (Top) and adversarial examples (Bottom) with different random transforms.
  • Figure 4: Standard accuracy and robust accuracy of the GAN-based purifier model trained with clean examples and adversarial examples, respectively, on (a) CIFAR-10 and (b) ImageNette. The dashed line represents the accuracy of standard training w/o attacks. (c) Standard accuracy and robust accuracy of our method with adversarially trained ResNet-18 on CIFAR-10.
  • Figure 5: For each group, the first column shows the clean example (left) and adversarial example (right). The following one is the masked image. The last column illustrates the purified image.
  • ...and 4 more figures