Table of Contents
Fetching ...

Adversarial Machine Learning: Attacking and Safeguarding Image Datasets

Koushik Chowdhury

TL;DR

CNNs on standard image datasets are vulnerable to adversarial perturbations produced by FGSM, which can be expressed as $x_{adv} = x + \epsilon \cdot \text{sign}(\nabla_x J(\theta, x, y))$ and manipulated with $\\epsilon$ to degrade accuracy. The paper evaluates four datasets (CIFAR-10, ImageNet, MNIST, Fashion-MNIST), demonstrates notable training/test performance gaps, and shows that adversarial training—retraining with a mix of clean and adversarial examples—improves robustness but does not fully prevent misclassification under attack. FGSM dramatically reduces test accuracy across all datasets, with larger drops on CIFAR-10 and ImageNet while MNIST and Fashion-MNIST show relatively better resilience. Overall, adversarial training provides meaningful defenses yet highlights the need for broader, more effective strategies to ensure real-world robustness of CNN-based vision systems.

Abstract

This paper examines the vulnerabilities of convolutional neural networks (CNNs) to adversarial attacks and explores a method for their safeguarding. In this study, CNNs were implemented on four of the most common image datasets, namely CIFAR-10, ImageNet, MNIST, and Fashion-MNIST, and achieved high baseline accuracy. To assess the strength of these models, the Fast Gradient Sign Method was used, which is a type of exploit on the model that is used to bring down the models accuracies by adding a very minimal perturbation to the input image. To counter the FGSM attack, a safeguarding approach went through, which includes retraining the models on clear and pollutant or adversarial images to increase their resistance ability. The next step involves applying FGSM again, but this time to the adversarially trained models, to see how much the accuracy of the models has gone down and evaluate the effectiveness of the defense. It appears that while most level of robustness is achieved against the models after adversarial training, there are still a few losses in the performance of these models against adversarial perturbations. This work emphasizes the need to create better defenses for models deployed in real-world scenarios against adversaries.

Adversarial Machine Learning: Attacking and Safeguarding Image Datasets

TL;DR

CNNs on standard image datasets are vulnerable to adversarial perturbations produced by FGSM, which can be expressed as and manipulated with to degrade accuracy. The paper evaluates four datasets (CIFAR-10, ImageNet, MNIST, Fashion-MNIST), demonstrates notable training/test performance gaps, and shows that adversarial training—retraining with a mix of clean and adversarial examples—improves robustness but does not fully prevent misclassification under attack. FGSM dramatically reduces test accuracy across all datasets, with larger drops on CIFAR-10 and ImageNet while MNIST and Fashion-MNIST show relatively better resilience. Overall, adversarial training provides meaningful defenses yet highlights the need for broader, more effective strategies to ensure real-world robustness of CNN-based vision systems.

Abstract

This paper examines the vulnerabilities of convolutional neural networks (CNNs) to adversarial attacks and explores a method for their safeguarding. In this study, CNNs were implemented on four of the most common image datasets, namely CIFAR-10, ImageNet, MNIST, and Fashion-MNIST, and achieved high baseline accuracy. To assess the strength of these models, the Fast Gradient Sign Method was used, which is a type of exploit on the model that is used to bring down the models accuracies by adding a very minimal perturbation to the input image. To counter the FGSM attack, a safeguarding approach went through, which includes retraining the models on clear and pollutant or adversarial images to increase their resistance ability. The next step involves applying FGSM again, but this time to the adversarially trained models, to see how much the accuracy of the models has gone down and evaluate the effectiveness of the defense. It appears that while most level of robustness is achieved against the models after adversarial training, there are still a few losses in the performance of these models against adversarial perturbations. This work emphasizes the need to create better defenses for models deployed in real-world scenarios against adversaries.

Paper Structure

This paper contains 22 sections, 1 equation, 3 figures, 3 tables, 2 algorithms.

Figures (3)

  • Figure 1: CIFAR-10: Effect after CNN and FGSM
  • Figure 2: Fashion MNIST: Effect after CNN and FGSM
  • Figure 3: Accuracy Comparison Before and After FGSM and Adversarial Training