Table of Contents
Fetching ...

How Deep Learning Sees the World: A Survey on Adversarial Attacks & Defenses

Joana C. Costa, Tiago Roxo, Hugo Proença, Pedro R. M. Inácio

TL;DR

This survey addresses the vulnerability of deep neural networks to adversarial perturbations in visual tasks, organizing attacks by attacker capacity and defenses into six domains, with a special focus on Vision Transformers. It compiles a comprehensive taxonomy of white-box, universal, black-box, and Auto-Attack–style attacks, and synthesizes defenses including adversarial training, training-process modifications, supplementary networks, architecture changes, validation, and purification. The work consolidates datasets and evaluation metrics, reporting state-of-the-art results on CIFAR-10/100 and ImageNet and outlining open issues such as robust evaluation across black-box and real-world scenarios. These insights offer practical guidance for deploying robust models and direct future research toward scalable, real-time defenses and standardized benchmarking.

Abstract

Deep Learning is currently used to perform multiple tasks, such as object recognition, face recognition, and natural language processing. However, Deep Neural Networks (DNNs) are vulnerable to perturbations that alter the network prediction (adversarial examples), raising concerns regarding its usage in critical areas, such as self-driving vehicles, malware detection, and healthcare. This paper compiles the most recent adversarial attacks, grouped by the attacker capacity, and modern defenses clustered by protection strategies. We also present the new advances regarding Vision Transformers, summarize the datasets and metrics used in the context of adversarial settings, and compare the state-of-the-art results under different attacks, finishing with the identification of open issues.

How Deep Learning Sees the World: A Survey on Adversarial Attacks & Defenses

TL;DR

This survey addresses the vulnerability of deep neural networks to adversarial perturbations in visual tasks, organizing attacks by attacker capacity and defenses into six domains, with a special focus on Vision Transformers. It compiles a comprehensive taxonomy of white-box, universal, black-box, and Auto-Attack–style attacks, and synthesizes defenses including adversarial training, training-process modifications, supplementary networks, architecture changes, validation, and purification. The work consolidates datasets and evaluation metrics, reporting state-of-the-art results on CIFAR-10/100 and ImageNet and outlining open issues such as robust evaluation across black-box and real-world scenarios. These insights offer practical guidance for deploying robust models and direct future research toward scalable, real-time defenses and standardized benchmarking.

Abstract

Deep Learning is currently used to perform multiple tasks, such as object recognition, face recognition, and natural language processing. However, Deep Neural Networks (DNNs) are vulnerable to perturbations that alter the network prediction (adversarial examples), raising concerns regarding its usage in critical areas, such as self-driving vehicles, malware detection, and healthcare. This paper compiles the most recent adversarial attacks, grouped by the attacker capacity, and modern defenses clustered by protection strategies. We also present the new advances regarding Vision Transformers, summarize the datasets and metrics used in the context of adversarial settings, and compare the state-of-the-art results under different attacks, finishing with the identification of open issues.
Paper Structure (34 sections, 13 equations, 15 figures, 7 tables)

This paper contains 34 sections, 13 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: Schematic example of the Convolutional Neural Networks mechanism to classify images.
  • Figure 2: Schematic example of a simplified vision transformer used to classify images.
  • Figure 3: Adversarial Examples created using different state-of-the-art adversarial attacks. The first column represents the original image; the second represents the perturbation used to generate the adversarial images in the third column. The images were resized for better visualization. Images withdrawn from Szegedy2014IntriguingPOMoosaviDezfooli2016DeepFoolASDabouei2020SmoothFoolAE. The first perturbation follows the edges of the building, the second is concentrated in the area of the whale, and the third is more smooth and greater in area.
  • Figure 4: Geometric representation of the $l_0$, $l_2$, and $l_\infty$ norms, from left to right, respectively.
  • Figure 5: Schematic overview of an Adversarial Attack under White-box Settings (left) and Black-box Settings (right). The first one uses the classifier predictions and network gradients to create perturbations (similar to noise), which can fool this classifier. These perturbations are added to the original images, creating adversarial images, which are fed to the network and cause misclassification. In the Black-box Settings, the same process is applied to a known classifier, and the obtained images are used to attack another classifier (represented as Target Architecture).
  • ...and 10 more figures