Table of Contents
Fetching ...

A Survey on Transferability of Adversarial Examples across Deep Neural Networks

Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, Xiaochun Cao, Philip Torr

TL;DR

This survey comprehensively examines the transferability of adversarial examples across deep neural networks, detailing formal definitions, evaluation metrics, and two primary families of attacks: optimization-based and generation-based. It catalogs a broad spectrum of techniques within each family, including data augmentation, gradient-based optimization, loss design, feature- and architecture-centered methods, as well as generator-based perturbation synthesis and conditional generation. The discussion extends beyond image classification to vision tasks, NLP, and cross-task or cross-modal settings, highlighting challenges such as imperfect transferability and the need for robust benchmarks and theory. The work underscores the practical implications for safety-critical systems and the development of defenses, while outlining promising directions like diffusion-based attacks, multimodal transferability, and principled evaluation frameworks.

Abstract

The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models into making erroneous predictions, raising concerns for safety-critical applications. An intriguing property of this phenomenon is the transferability of adversarial examples, where perturbations crafted for one model can deceive another, often with a different architecture. This intriguing property enables black-box attacks which circumvents the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples. We categorize existing methodologies to enhance adversarial transferability and discuss the fundamental principles guiding each approach. While the predominant body of research primarily concentrates on image classification, we also extend our discussion to encompass other vision tasks and beyond. Challenges and opportunities are discussed, highlighting the importance of fortifying DNNs against adversarial vulnerabilities in an evolving landscape.

A Survey on Transferability of Adversarial Examples across Deep Neural Networks

TL;DR

This survey comprehensively examines the transferability of adversarial examples across deep neural networks, detailing formal definitions, evaluation metrics, and two primary families of attacks: optimization-based and generation-based. It catalogs a broad spectrum of techniques within each family, including data augmentation, gradient-based optimization, loss design, feature- and architecture-centered methods, as well as generator-based perturbation synthesis and conditional generation. The discussion extends beyond image classification to vision tasks, NLP, and cross-task or cross-modal settings, highlighting challenges such as imperfect transferability and the need for robust benchmarks and theory. The work underscores the practical implications for safety-critical systems and the development of defenses, while outlining promising directions like diffusion-based attacks, multimodal transferability, and principled evaluation frameworks.

Abstract

The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models into making erroneous predictions, raising concerns for safety-critical applications. An intriguing property of this phenomenon is the transferability of adversarial examples, where perturbations crafted for one model can deceive another, often with a different architecture. This intriguing property enables black-box attacks which circumvents the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples. We categorize existing methodologies to enhance adversarial transferability and discuss the fundamental principles guiding each approach. While the predominant body of research primarily concentrates on image classification, we also extend our discussion to encompass other vision tasks and beyond. Challenges and opportunities are discussed, highlighting the importance of fortifying DNNs against adversarial vulnerabilities in an evolving landscape.
Paper Structure (22 sections, 64 equations, 3 tables)