Table of Contents
Fetching ...

Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

Xiaosen Wang, Zhijin Ge, Bohan Liu, Zheng Fang, Fengfan Zhou, Ruixuan Zhang, Shaokang Wang, Yuyang Luo

TL;DR

A comprehensive framework designed to serve as a benchmark for evaluating transfer-based attacks is proposed, delineating common strategies that enhance adversarial transferability and highlighting prevalent issues that could lead to unfair comparisons.

Abstract

Adversarial transferability refers to the capacity of adversarial examples generated on the surrogate model to deceive alternate, unexposed victim models. This property eliminates the need for direct access to the victim model during an attack, thereby raising considerable security concerns in practical applications and attracting substantial research attention recently. In this work, we discern a lack of a standardized framework and criteria for evaluating transfer-based attacks, leading to potentially biased assessments of existing approaches. To rectify this gap, we have conducted an exhaustive review of hundreds of related works, organizing various transfer-based attacks into six distinct categories. Subsequently, we propose a comprehensive framework designed to serve as a benchmark for evaluating these attacks. In addition, we delineate common strategies that enhance adversarial transferability and highlight prevalent issues that could lead to unfair comparisons. Finally, we provide a brief review of transfer-based attacks beyond image classification.

Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

TL;DR

A comprehensive framework designed to serve as a benchmark for evaluating transfer-based attacks is proposed, delineating common strategies that enhance adversarial transferability and highlighting prevalent issues that could lead to unfair comparisons.

Abstract

Adversarial transferability refers to the capacity of adversarial examples generated on the surrogate model to deceive alternate, unexposed victim models. This property eliminates the need for direct access to the victim model during an attack, thereby raising considerable security concerns in practical applications and attracting substantial research attention recently. In this work, we discern a lack of a standardized framework and criteria for evaluating transfer-based attacks, leading to potentially biased assessments of existing approaches. To rectify this gap, we have conducted an exhaustive review of hundreds of related works, organizing various transfer-based attacks into six distinct categories. Subsequently, we propose a comprehensive framework designed to serve as a benchmark for evaluating these attacks. In addition, we delineate common strategies that enhance adversarial transferability and highlight prevalent issues that could lead to unfair comparisons. Finally, we provide a brief review of transfer-based attacks beyond image classification.
Paper Structure (28 sections, 13 equations, 3 figures, 11 tables)

This paper contains 28 sections, 13 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Categorization of existing transfer-based attacks. We systematically categorize existing transfer-based attacks into six distinct classes: 1) Gradient-based Attack optimizes the gradient calculation procedure to derive gradients that are more effective for the attack. 2) Input Transformation-based attack transforms the image prior to model to enhance the diversity of input image. 3) Advanced objective function replaces the conventional cross-entropy loss with a more complex objective function to enhance attack efficacy. 4) Model-related attack refines either the forward or backward propagation process, tailored to the architecture of the surrogate model employed. 5) Ensemble-based Attack adopts multiple surrogate models to generate adversarial examples. 6) Generation-based Attack trains a generator to directly crafts adversarial examples.
  • Figure 2: The overview of existing transfer-based adversarial attacks on Image Classification. Initially, these attacks are categorized into two principal types: untargeted and targeted attack. Subsequently, we further classify them into six distinct classes, as depicted in Fig. \ref{['fig:overview']}. The arrangement of the referenced papers follows a chronological order, based on their dates of publication.
  • Figure 3: Overview of existing transfer-based adversarial attacks beyond Image Classification. The tasks are categorized into three types: Computer Vision, Natural Language Processing and Multi-Modal Tasks. Subsequently, they are further divided into sub-tasks to depict the nuances of adversarial attacks across various tasks. The arrangement of the referenced papers follows a chronological order, based on their dates of publication.