Table of Contents
Fetching ...

Adversarial Examples: Attacks and Defenses for Deep Learning

Xiaoyong Yuan, Pan He, Qile Zhu, Xiaolin Li

TL;DR

This survey analyzes adversarial examples in deep learning, presenting a taxonomy along threat model, perturbation, and benchmarks. It catalogs a broad range of attack methods from gradient-based to black-box and cross-model approaches, and surveys defenses including distillation, adversarial training, detectors, reconstruction, and verification. The paper highlights applications across reinforcement learning, generative modeling, face recognition, object detection, segmentation, NLP, and malware detection, and discusses core challenges such as transferability and robust evaluation. It advocates for standardized benchmarks and broader robustness research to ensure the safety of DL systems in safety-critical contexts.

Abstract

With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial examples are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples and explore the challenges and the potential solutions.

Adversarial Examples: Attacks and Defenses for Deep Learning

TL;DR

This survey analyzes adversarial examples in deep learning, presenting a taxonomy along threat model, perturbation, and benchmarks. It catalogs a broad range of attack methods from gradient-based to black-box and cross-model approaches, and surveys defenses including distillation, adversarial training, detectors, reconstruction, and verification. The paper highlights applications across reinforcement learning, generative modeling, face recognition, object detection, segmentation, NLP, and malware detection, and discusses core challenges such as transferability and robust evaluation. It advocates for standardized benchmarks and broader robustness research to ensure the safety of DL systems in safety-critical contexts.

Abstract

With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial examples are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples and explore the challenges and the potential solutions.

Paper Structure

This paper contains 49 sections, 34 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: An adversarial image generated by Fast Gradient Sign Methodgoodfellow2014explaining: left: a clean image of a panda; middle: the perturbation; right: one sample adversarial image, classified as a gibbon.
  • Figure 2: Unrecognizable examples to humans, but deep neural networks classify them to a class with high certainty ($\geq 99.6\%$) nguyen2015deep
  • Figure 3: A universal adversarial example fools the neural network on images. Left images: original labeled natural images; center image: universal perturbation; right images: perturbed images with wrong labels. moosavi2016universal
  • Figure 4: Adversarial attacks for autoencoders tabacof2016adversarial. Perturbations are added to the input the encoder. After encoding and decoding, the decoder will output an adversarial image presenting an incorrect class
  • Figure 5: An example of adversarial eyeglass frame against Face Recognition System sharif2016accessorize
  • ...and 5 more figures