Adversarial Examples: Attacks and Defenses for Deep Learning
Xiaoyong Yuan, Pan He, Qile Zhu, Xiaolin Li
TL;DR
This survey analyzes adversarial examples in deep learning, presenting a taxonomy along threat model, perturbation, and benchmarks. It catalogs a broad range of attack methods from gradient-based to black-box and cross-model approaches, and surveys defenses including distillation, adversarial training, detectors, reconstruction, and verification. The paper highlights applications across reinforcement learning, generative modeling, face recognition, object detection, segmentation, NLP, and malware detection, and discusses core challenges such as transferability and robust evaluation. It advocates for standardized benchmarks and broader robustness research to ensure the safety of DL systems in safety-critical contexts.
Abstract
With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input samples, called adversarial examples. Adversarial examples are imperceptible to human but can easily fool deep neural networks in the testing/deploying stage. The vulnerability to adversarial examples becomes one of the major risks for applying deep neural networks in safety-critical environments. Therefore, attacks and defenses on adversarial examples draw great attention. In this paper, we review recent findings on adversarial examples for deep neural networks, summarize the methods for generating adversarial examples, and propose a taxonomy of these methods. Under the taxonomy, applications for adversarial examples are investigated. We further elaborate on countermeasures for adversarial examples and explore the challenges and the potential solutions.
