Table of Contents
Fetching ...

Improving Transferability of Adversarial Examples via Bayesian Attacks

Qizhang Li, Yiwen Guo, Xiaochen Yang, Wangmeng Zuo, Hao Chen

TL;DR

This work addresses the vulnerability of DNNs to adversarial examples by improving transferability through a Bayesian attack framework that jointly samples substitute-model parameters from $p(\mathbf{w}|\mathcal{D})$ and input perturbations from $p(\mathbf{e}|\mathbf{x},\mathbf{w})$. By employing Monte Carlo sampling, Gaussian and SWAG-based posterior approximations, and (optionally) fine-tuning, the method crafts adversarial examples that generalize across unseen victim models, achieving new state-of-the-art transferability on ImageNet and CIFAR-10. The approach is shown to outperform non-Bayesian baselines, strengthen performance when combined with other attacks, and remain effective against defense strategies such as adversarial training and input transformations. This work advances understanding of how both input and parameter uncertainty contribute to transferability and provides practical tools for evaluating and stress-testing defenses in real-world systems.

Abstract

The transferability of adversarial examples allows for the attack on unknown deep neural networks (DNNs), posing a serious threat to many applications and attracting great attention. In this paper, we improve the transferability of adversarial examples by incorporating the Bayesian formulation into both the model parameters and model input, enabling their joint diversification. We demonstrate that combination of Bayesian formulations for both the model input and model parameters yields significant improvements in transferability. By introducing advanced approximations of the posterior distribution over the model input, adversarial transferability achieves further enhancement, surpassing all state-of-the-arts when attacking without model fine-tuning. Additionally, we propose a principled approach to fine-tune model parameters within this Bayesian framework. Extensive experiments demonstrate that our method achieves a new state-of-the-art in transfer-based attacks, significantly improving the average success rate on ImageNet and CIFAR-10. Code at: https://github.com/qizhangli/MoreBayesian-jrnl.

Improving Transferability of Adversarial Examples via Bayesian Attacks

TL;DR

This work addresses the vulnerability of DNNs to adversarial examples by improving transferability through a Bayesian attack framework that jointly samples substitute-model parameters from and input perturbations from . By employing Monte Carlo sampling, Gaussian and SWAG-based posterior approximations, and (optionally) fine-tuning, the method crafts adversarial examples that generalize across unseen victim models, achieving new state-of-the-art transferability on ImageNet and CIFAR-10. The approach is shown to outperform non-Bayesian baselines, strengthen performance when combined with other attacks, and remain effective against defense strategies such as adversarial training and input transformations. This work advances understanding of how both input and parameter uncertainty contribute to transferability and provides practical tools for evaluating and stress-testing defenses in real-world systems.

Abstract

The transferability of adversarial examples allows for the attack on unknown deep neural networks (DNNs), posing a serious threat to many applications and attracting great attention. In this paper, we improve the transferability of adversarial examples by incorporating the Bayesian formulation into both the model parameters and model input, enabling their joint diversification. We demonstrate that combination of Bayesian formulations for both the model input and model parameters yields significant improvements in transferability. By introducing advanced approximations of the posterior distribution over the model input, adversarial transferability achieves further enhancement, surpassing all state-of-the-arts when attacking without model fine-tuning. Additionally, we propose a principled approach to fine-tune model parameters within this Bayesian framework. Extensive experiments demonstrate that our method achieves a new state-of-the-art in transfer-based attacks, significantly improving the average success rate on ImageNet and CIFAR-10. Code at: https://github.com/qizhangli/MoreBayesian-jrnl.
Paper Structure (16 sections, 22 equations, 3 figures, 7 tables, 2 algorithms)

This paper contains 16 sections, 22 equations, 3 figures, 7 tables, 2 algorithms.

Figures (3)

  • Figure 1: Comparison of success rates in attacking 10 different victim models when adversarial examples are generated on a substitute model (ResNet-50) in a non-Bayesian manner (i.e., I-FGSM) and using Bayesian modeling for model parameters ${\mathbf{w}}_j$, inputs ${\mathbf{e}}_k$ (i.e., vr-IGSM wu2018understanding), and both ${\mathbf{w}}_j$ and ${\mathbf{e}}_k$ (without fine-tuning). Dotted lines indicate the average success rate across all 10 victim models. We performed $\ell_\infty$ attacks with $\epsilon=8/255$. Best viewed in color.
  • Figure 2: Comparison of adversarial attacks with and without the proposed model fine-tuning, both using isotropic Gaussian posteriors over model parameters and inputs. Dotted lines indicate the average success rates over all 10 victim models. We performed $\ell_\infty$ attacks with $\epsilon=8/255$. Best viewed in color.
  • Figure 3: Average success rates of attacking 10 victim models on ImageNet with varying $M$ and $S$. Darker cubic indicates better performance. Best viewed in color.