Table of Contents
Fetching ...

AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

Antonio Emanuele Cinà, Jérôme Rony, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Ismail Ben Ayed, Fabio Roli

TL;DR

AttackBench addresses biased and non-comparable evaluations of gradient-based attacks by introducing a unified, budget-bounded benchmarking framework. It defines an optimality-based ranking that combines attack effectiveness and efficiency through Local Optimality ($\xi^i_{\boldsymbol{\theta}}$) and Global Optimality (GO), computed from robustness curves via $\rho_a(\varepsilon)$ and the area under the curve $\text{AUREC}_a(\varepsilon_0)$. Experiments across $102$ attacks and roughly $815$ runs on CIFAR-10 and ImageNet reveal that only a subset approaches the empirical optimum, while exposing implementation issues across libraries that can distort results. The framework uses a diverse model zoo and multiple attack libraries to ensure fair, reproducible comparisons, with an open-source approach and plans to extend to black-box attacks and other domains.

Abstract

Adversarial examples are typically optimized with gradient-based attacks. While novel attacks are continuously proposed, each is shown to outperform its predecessors using different experimental setups, hyperparameter settings, and number of forward and backward calls to the target models. This provides overly-optimistic and even biased evaluations that may unfairly favor one particular attack over the others. In this work, we aim to overcome these limitations by proposing AttackBench, i.e., the first evaluation framework that enables a fair comparison among different attacks. To this end, we first propose a categorization of gradient-based attacks, identifying their main components and differences. We then introduce our framework, which evaluates their effectiveness and efficiency. We measure these characteristics by (i) defining an optimality metric that quantifies how close an attack is to the optimal solution, and (ii) limiting the number of forward and backward queries to the model, such that all attacks are compared within a given maximum query budget. Our extensive experimental analysis compares more than $100$ attack implementations with a total of over $800$ different configurations against CIFAR-10 and ImageNet models, highlighting that only very few attacks outperform all the competing approaches. Within this analysis, we shed light on several implementation issues that prevent many attacks from finding better solutions or running at all. We release AttackBench as a publicly-available benchmark, aiming to continuously update it to include and evaluate novel gradient-based attacks for optimizing adversarial examples.

AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

TL;DR

AttackBench addresses biased and non-comparable evaluations of gradient-based attacks by introducing a unified, budget-bounded benchmarking framework. It defines an optimality-based ranking that combines attack effectiveness and efficiency through Local Optimality () and Global Optimality (GO), computed from robustness curves via and the area under the curve . Experiments across attacks and roughly runs on CIFAR-10 and ImageNet reveal that only a subset approaches the empirical optimum, while exposing implementation issues across libraries that can distort results. The framework uses a diverse model zoo and multiple attack libraries to ensure fair, reproducible comparisons, with an open-source approach and plans to extend to black-box attacks and other domains.

Abstract

Adversarial examples are typically optimized with gradient-based attacks. While novel attacks are continuously proposed, each is shown to outperform its predecessors using different experimental setups, hyperparameter settings, and number of forward and backward calls to the target models. This provides overly-optimistic and even biased evaluations that may unfairly favor one particular attack over the others. In this work, we aim to overcome these limitations by proposing AttackBench, i.e., the first evaluation framework that enables a fair comparison among different attacks. To this end, we first propose a categorization of gradient-based attacks, identifying their main components and differences. We then introduce our framework, which evaluates their effectiveness and efficiency. We measure these characteristics by (i) defining an optimality metric that quantifies how close an attack is to the optimal solution, and (ii) limiting the number of forward and backward queries to the model, such that all attacks are compared within a given maximum query budget. Our extensive experimental analysis compares more than attack implementations with a total of over different configurations against CIFAR-10 and ImageNet models, highlighting that only very few attacks outperform all the competing approaches. Within this analysis, we shed light on several implementation issues that prevent many attacks from finding better solutions or running at all. We release AttackBench as a publicly-available benchmark, aiming to continuously update it to include and evaluate novel gradient-based attacks for optimizing adversarial examples.
Paper Structure (21 sections, 7 equations, 14 figures, 17 tables)

This paper contains 21 sections, 7 equations, 14 figures, 17 tables.

Figures (14)

  • Figure 1: A comprehensive overview of the five stages of AttackBench. Each attack is tested in fair conditions and ranked through the optimality metric. The best attack is the one that produces minimally-perturbed adversarial examples with fewer queries.
  • Figure 2: Attack Benchmarking
  • Figure 3: Robustness evaluation curves of $a\xspace^{i}$ and $a\xspace^{\star}$.
  • Figure 4: Computing Local Optimality
  • Figure 5: Robustness evaluation curves for the $7$ best $\ell_0$, $\ell_1$, $\ell_2$, and $\ell_\infty$-norm attacks against C3 Stutz2019CCAT.
  • ...and 9 more figures