Table of Contents
Fetching ...

PEAS: A Strategy for Crafting Transferable Adversarial Examples

Bar Avraham, Yisroel Mirsky

TL;DR

This work proposes a novel strategy called PEAS that can boost the transferability of existing black box attacks, and introduces a novel perceptual equivalence-based search space that challenges the common \(\epsilon\) -ball constraint used in adversarial machine learning, and reveals that natural augmentations alone can induce adversarial failures.

Abstract

Black box attacks, where adversaries have limited knowledge of the target model, pose a significant threat to machine learning systems. Adversarial examples generated with a substitute model often suffer from limited transferability to the target model. While recent work explores ranking perturbations for improved success rates, these methods see only modest gains. We propose a novel strategy called PEAS that can boost the transferability of existing black box attacks. PEAS leverages the insight that samples which are perceptually equivalent exhibit significant variability in their adversarial transferability. Our approach first generates a set of images from an initial sample via subtle augmentations. We then evaluate the transferability of adversarial perturbations on these images using a set of substitute models. Finally, the most transferable adversarial example is selected and used for the attack. Our experiments show that PEAS can double the performance of existing attacks, achieving a 2.5x improvement in attack success rates on average over current ranking methods. We thoroughly evaluate PEAS on ImageNet and CIFAR-10, analyze hyperparameter impacts, and provide an ablation study to isolate each component's importance.

PEAS: A Strategy for Crafting Transferable Adversarial Examples

TL;DR

This work proposes a novel strategy called PEAS that can boost the transferability of existing black box attacks, and introduces a novel perceptual equivalence-based search space that challenges the common -ball constraint used in adversarial machine learning, and reveals that natural augmentations alone can induce adversarial failures.

Abstract

Black box attacks, where adversaries have limited knowledge of the target model, pose a significant threat to machine learning systems. Adversarial examples generated with a substitute model often suffer from limited transferability to the target model. While recent work explores ranking perturbations for improved success rates, these methods see only modest gains. We propose a novel strategy called PEAS that can boost the transferability of existing black box attacks. PEAS leverages the insight that samples which are perceptually equivalent exhibit significant variability in their adversarial transferability. Our approach first generates a set of images from an initial sample via subtle augmentations. We then evaluate the transferability of adversarial perturbations on these images using a set of substitute models. Finally, the most transferable adversarial example is selected and used for the attack. Our experiments show that PEAS can double the performance of existing attacks, achieving a 2.5x improvement in attack success rates on average over current ranking methods. We thoroughly evaluate PEAS on ImageNet and CIFAR-10, analyze hyperparameter impacts, and provide an ablation study to isolate each component's importance.

Paper Structure

This paper contains 13 sections, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: The attack process of PEAS: (1) explore the space around input $x$ by generating perceptually equivalent images with a sampling function (e.g., subtle image augmentations), (2) attack each sample using any adversarial example algorithm (e.g., a white box attack on substitute model $f'$), (3) measure the expected transferability of each sample using a set of substitute models $\mathcal{F}$, (4) select the sample that has the highest expected transferability score ($x^*$) and use it for the attack on the victim's model $f$.
  • Figure 2: This example demonstrates how subtle augmentations can result in large $(\ell_2, \ell_\infty)$ distances from the original image yet remain perceptually equivalent. Therefore, we argue that these subtle transformations can be used in an adversarial example attack.
  • Figure 3: Sample images before ($x$) and after ($x^*$) the application of the BTA-PEAS attack using two different sampling functions, $S_1$ and $S_2$. The left image is $x$ (correctly classified by $f$) and the right image is the black box adversarial example (misclassified by $f$).
  • Figure 4: The effect of the exploration size $n$ on the performance of BTA-PEAS and the Vanilla ranking strategy. The grey margin captures the confidence interval for $p=0.99$.
  • Figure 5: The performance of each augmentation in BTA-PEAS as a function of the number of instances of $x$ created during the exploration step. The gray margin captures the confidence interval for $p=0.95$.
  • ...and 1 more figures