Table of Contents
Fetching ...

Transferable Adversarial Examples with Bayes Approach

Mingyuan Fan, Cen Chen, Wenmeng Zhou, Yinggui Wang

TL;DR

The paper tackles the problem of black-box transferability of adversarial examples by introducing BayAtk, a Bayesian framework that uses transferability-promoting priors to encourage disruption of cross-model features. It defines pixel-level removal and region-based soft removal priors and combines them with an adaptive dynamic weighting strategy to generate highly transferable adversarial inputs. Extensive experiments on ImageNet and real-world systems (Google MLaaS and Claude3) show BayAtk outperforms state-of-the-art transfer attacks, including under defense scenarios, and demonstrates practical efficiency. The results illuminate a principled way to study transferability through priors and have implications for both attacking and defending DNN-based systems in security-critical applications.

Abstract

The vulnerability of deep neural networks (DNNs) to black-box adversarial attacks is one of the most heated topics in trustworthy AI. In such attacks, the attackers operate without any insider knowledge of the model, making the cross-model transferability of adversarial examples critical. Despite the potential for adversarial examples to be effective across various models, it has been observed that adversarial examples that are specifically crafted for a specific model often exhibit poor transferability. In this paper, we explore the transferability of adversarial examples via the lens of Bayesian approach. Specifically, we leverage Bayesian approach to probe the transferability and then study what constitutes a transferability-promoting prior. Following this, we design two concrete transferability-promoting priors, along with an adaptive dynamic weighting strategy for instances sampled from these priors. Employing these techniques, we present BayAtk. Extensive experiments illustrate the significant effectiveness of BayAtk in crafting more transferable adversarial examples against both undefended and defended black-box models compared to existing state-of-the-art attacks.

Transferable Adversarial Examples with Bayes Approach

TL;DR

The paper tackles the problem of black-box transferability of adversarial examples by introducing BayAtk, a Bayesian framework that uses transferability-promoting priors to encourage disruption of cross-model features. It defines pixel-level removal and region-based soft removal priors and combines them with an adaptive dynamic weighting strategy to generate highly transferable adversarial inputs. Extensive experiments on ImageNet and real-world systems (Google MLaaS and Claude3) show BayAtk outperforms state-of-the-art transfer attacks, including under defense scenarios, and demonstrates practical efficiency. The results illuminate a principled way to study transferability through priors and have implications for both attacking and defending DNN-based systems in security-critical applications.

Abstract

The vulnerability of deep neural networks (DNNs) to black-box adversarial attacks is one of the most heated topics in trustworthy AI. In such attacks, the attackers operate without any insider knowledge of the model, making the cross-model transferability of adversarial examples critical. Despite the potential for adversarial examples to be effective across various models, it has been observed that adversarial examples that are specifically crafted for a specific model often exhibit poor transferability. In this paper, we explore the transferability of adversarial examples via the lens of Bayesian approach. Specifically, we leverage Bayesian approach to probe the transferability and then study what constitutes a transferability-promoting prior. Following this, we design two concrete transferability-promoting priors, along with an adaptive dynamic weighting strategy for instances sampled from these priors. Employing these techniques, we present BayAtk. Extensive experiments illustrate the significant effectiveness of BayAtk in crafting more transferable adversarial examples against both undefended and defended black-box models compared to existing state-of-the-art attacks.
Paper Structure (23 sections, 1 theorem, 13 equations, 5 figures, 10 tables, 2 algorithms)

This paper contains 23 sections, 1 theorem, 13 equations, 5 figures, 10 tables, 2 algorithms.

Key Result

theorem 1

Pixel-level removal prior is a transferability-promoting prior.

Figures (5)

  • Figure 1: The overview of the proposed attack BayAtk. In each iteration, BayAtk samples several instances from the transferability-promoting priors and feeds them into the proxy model to obtain the model's prediction probabilities. Then, BayAtk weights these probabilities using the adaptive dynamic weighting strategy. Finally, BayAtk employs backpropagation algorithm for updates.
  • Figure 2: We divide the leftmost image into different regions and compute the sensitivity of ResNet50 and DenseNet121 to the features in these regions (by summing the absolute values of the gradients). We see that model-specific features learned by ResNet50 and DenseNet121 (the bright squares) are located in different positions, one in the upper left corner and the other in the lower right corner. In contrast, cross-model features are more concentrated in the central area of the image. If we directly maximize the loss of the sample on ResNet50, the generated adversarial example tends to overly focus on disrupting the features in the upper left corner while neglecting the features in the central part of the image.
  • Figure 3: Three successful attack cases using BayAtk. On the left are the adversarial examples generated by BayAtk, and on the right are the Top-5 predictions from the Google MLaaS Classification System for these examples.
  • Figure 4: The ASRs and runtimes of BayAtk over different attack iterations.
  • Figure 5: The responses of Claude3 to the adversarial examples generated by BayAtk are presented here. We prompted Claude3 with "Describe this image.".

Theorems & Definitions (3)

  • definition 1
  • definition 2
  • theorem 1