Table of Contents
Fetching ...

Improving Adversarial Transferability by Stable Diffusion

Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, Ee-Chien Chang

TL;DR

This work tackles the problem of adversarial transferability by introducing SDAM, a diffusion-guided augmentation strategy that mixes the input with multiple Stable Diffusion samples to improve cross-model transferability. A fast SDAM variant reduces computational overhead by reusing initial diffusion samples across iterations. Extensive experiments across normally trained, adversarially trained, and defense models show that SDAM outperforms state-of-the-art baselines and remains compatible with existing transfer-based attacks, highlighting diffusion-generated data as a powerful augmentation for adversarial transferability.

Abstract

Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving DNN predictions. While some attack methods excel in the white-box setting, they often struggle in the black-box scenario, particularly against models fortified with defense mechanisms. Various techniques have emerged to enhance the transferability of adversarial attacks for the black-box scenario. Among these, input transformation-based attacks have demonstrated their effectiveness. In this paper, we explore the potential of leveraging data generated by Stable Diffusion to boost adversarial transferability. This approach draws inspiration from recent research that harnessed synthetic data generated by Stable Diffusion to enhance model generalization. In particular, previous work has highlighted the correlation between the presence of both real and synthetic data and improved model generalization. Building upon this insight, we introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images. Furthermore, we propose a fast variant of SDAM to reduce computational overhead while preserving high adversarial transferability. Our extensive experimental results demonstrate that our method outperforms state-of-the-art baselines by a substantial margin. Moreover, our approach is compatible with existing transfer-based attacks to further enhance adversarial transferability.

Improving Adversarial Transferability by Stable Diffusion

TL;DR

This work tackles the problem of adversarial transferability by introducing SDAM, a diffusion-guided augmentation strategy that mixes the input with multiple Stable Diffusion samples to improve cross-model transferability. A fast SDAM variant reduces computational overhead by reusing initial diffusion samples across iterations. Extensive experiments across normally trained, adversarially trained, and defense models show that SDAM outperforms state-of-the-art baselines and remains compatible with existing transfer-based attacks, highlighting diffusion-generated data as a powerful augmentation for adversarial transferability.

Abstract

Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving DNN predictions. While some attack methods excel in the white-box setting, they often struggle in the black-box scenario, particularly against models fortified with defense mechanisms. Various techniques have emerged to enhance the transferability of adversarial attacks for the black-box scenario. Among these, input transformation-based attacks have demonstrated their effectiveness. In this paper, we explore the potential of leveraging data generated by Stable Diffusion to boost adversarial transferability. This approach draws inspiration from recent research that harnessed synthetic data generated by Stable Diffusion to enhance model generalization. In particular, previous work has highlighted the correlation between the presence of both real and synthetic data and improved model generalization. Building upon this insight, we introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images. Furthermore, we propose a fast variant of SDAM to reduce computational overhead while preserving high adversarial transferability. Our extensive experimental results demonstrate that our method outperforms state-of-the-art baselines by a substantial margin. Moreover, our approach is compatible with existing transfer-based attacks to further enhance adversarial transferability.
Paper Structure (27 sections, 6 equations, 2 figures, 5 tables)

This paper contains 27 sections, 6 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The overall framework of Stable Diffusion Attack Method.
  • Figure 2: The attack success rates (%) of our method with different values of hyper-parameters.