Table of Contents
Fetching ...

Versatile Defense Against Adversarial Attacks on Image Recognition

Haibo Zhang, Zhihua Yao, Kouichi Sakurai

TL;DR

The proposed versatile defense approach in this paper only requires training one model to effectively resist various unknown adversarial attacks, and the trained model has successfully improved the classification accuracy from nearly zero to an average of 86%, performing better than other defense methods proposed in prior studies.

Abstract

Adversarial attacks present a significant security risk to image recognition tasks. Defending against these attacks in a real-life setting can be compared to the way antivirus software works, with a key consideration being how well the defense can adapt to new and evolving attacks. Another important factor is the resources involved in terms of time and cost for training defense models and updating the model database. Training many models that are specific to each type of attack can be time-consuming and expensive. Ideally, we should be able to train one single model that can handle a wide range of attacks. It appears that a defense method based on image-to-image translation may be capable of this. The proposed versatile defense approach in this paper only requires training one model to effectively resist various unknown adversarial attacks. The trained model has successfully improved the classification accuracy from nearly zero to an average of 86%, performing better than other defense methods proposed in prior studies. When facing the PGD attack and the MI-FGSM attack, versatile defense model even outperforms the attack-specific models trained based on these two attacks. The robustness check also shows that our versatile defense model performs stably regardless with the attack strength.

Versatile Defense Against Adversarial Attacks on Image Recognition

TL;DR

The proposed versatile defense approach in this paper only requires training one model to effectively resist various unknown adversarial attacks, and the trained model has successfully improved the classification accuracy from nearly zero to an average of 86%, performing better than other defense methods proposed in prior studies.

Abstract

Adversarial attacks present a significant security risk to image recognition tasks. Defending against these attacks in a real-life setting can be compared to the way antivirus software works, with a key consideration being how well the defense can adapt to new and evolving attacks. Another important factor is the resources involved in terms of time and cost for training defense models and updating the model database. Training many models that are specific to each type of attack can be time-consuming and expensive. Ideally, we should be able to train one single model that can handle a wide range of attacks. It appears that a defense method based on image-to-image translation may be capable of this. The proposed versatile defense approach in this paper only requires training one model to effectively resist various unknown adversarial attacks. The trained model has successfully improved the classification accuracy from nearly zero to an average of 86%, performing better than other defense methods proposed in prior studies. When facing the PGD attack and the MI-FGSM attack, versatile defense model even outperforms the attack-specific models trained based on these two attacks. The robustness check also shows that our versatile defense model performs stably regardless with the attack strength.
Paper Structure (24 sections, 6 equations, 5 figures, 2 tables)

This paper contains 24 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Adversarial examples of FGSM attack and PGD attack. In part (a), the original image is correctly classified as a Japanese_spaniel with a confidence of 96.12%, but the perturbed image, crafted by the FGSM attack, is misclassified as a colobus with a confidence of 24.13%. In part (b), the original image is correctly classified as a junco with a confidence of 93.84, but the perturbed image, crafted by the PGD attack, is misclassified as a house_finch with a confidence of 40.64%.
  • Figure 2: The training progress of image reconstruction method.
  • Figure 3: The computation of the PSNR and MAE values for both the images subjected to six types of adversarial attacks and those reconstructed by the universal defense model when compared to the original images.
  • Figure 4: Robustness Check using the PGD attack and the MI-FGSM attack. To simulate different attack strength, we gradually changing the iteration number from 10 to 100, and the $\epsilon$ includes 2/255, 5/255, and 10/255.
  • Figure 5: The visual display of images under six different adversarial attack scenarios. The layout is as follows: 1) the first row consists of the original clean images; 2) the second row displays the corresponding images post-attack; and 3) the third row exhibits the images after they have been processed and restored by our universal defense model.