Table of Contents
Fetching ...

Towards more transferable adversarial attack in black-box manner

Chun Tong Lei, Zhongliang Guo, Hon Chung Lee, Minh Quoc Duong, Chun Pong Lau

TL;DR

This work tackles the practical inefficiency of diffusion-based adversarial purification in black-box attacks by introducing ScorePGD and Universal ScorePGD, which leverage the score of a time-dependent classifier to inject noised data distribution knowledge into adversarial optimization. By replacing full diffusion chains with classifier-guided guidance, the methods achieve high transferability across diverse architectures, including diffusion-defended targets, while dramatically reducing runtime and VRAM needs. Empirical results show strong black-box transfer performance and pronounced anti-purification effects, with ScorePGD often excelling against protected models and U-ScorePGD delivering superior performance on unprotected targets, all at an order-of-magnitude speedup over DiffPGD. These findings highlight that knowledge of noised data distributions, rather than the complete diffusion process, is the key driver of transferable adversarial attacks, enabling efficient robustness evaluation in resource-constrained settings.

Abstract

Adversarial attacks have become a well-explored domain, frequently serving as evaluation baselines for model robustness. Among these, black-box attacks based on transferability have received significant attention due to their practical applicability in real-world scenarios. Traditional black-box methods have generally focused on improving the optimization framework (e.g., utilizing momentum in MI-FGSM) to enhance transferability, rather than examining the dependency on surrogate white-box model architectures. Recent state-of-the-art approach DiffPGD has demonstrated enhanced transferability by employing diffusion-based adversarial purification models for adaptive attacks. The inductive bias of diffusion-based adversarial purification aligns naturally with the adversarial attack process, where both involving noise addition, reducing dependency on surrogate white-box model selection. However, the denoising process of diffusion models incurs substantial computational costs through chain rule derivation, manifested in excessive VRAM consumption and extended runtime. This progression prompts us to question whether introducing diffusion models is necessary. We hypothesize that a model sharing similar inductive bias to diffusion-based adversarial purification, combined with an appropriate loss function, could achieve comparable or superior transferability while dramatically reducing computational overhead. In this paper, we propose a novel loss function coupled with a unique surrogate model to validate our hypothesis. Our approach leverages the score of the time-dependent classifier from classifier-guided diffusion models, effectively incorporating natural data distribution knowledge into the adversarial optimization process. Experimental results demonstrate significantly improved transferability across diverse model architectures while maintaining robustness against diffusion-based defenses.

Towards more transferable adversarial attack in black-box manner

TL;DR

This work tackles the practical inefficiency of diffusion-based adversarial purification in black-box attacks by introducing ScorePGD and Universal ScorePGD, which leverage the score of a time-dependent classifier to inject noised data distribution knowledge into adversarial optimization. By replacing full diffusion chains with classifier-guided guidance, the methods achieve high transferability across diverse architectures, including diffusion-defended targets, while dramatically reducing runtime and VRAM needs. Empirical results show strong black-box transfer performance and pronounced anti-purification effects, with ScorePGD often excelling against protected models and U-ScorePGD delivering superior performance on unprotected targets, all at an order-of-magnitude speedup over DiffPGD. These findings highlight that knowledge of noised data distributions, rather than the complete diffusion process, is the key driver of transferable adversarial attacks, enabling efficient robustness evaluation in resource-constrained settings.

Abstract

Adversarial attacks have become a well-explored domain, frequently serving as evaluation baselines for model robustness. Among these, black-box attacks based on transferability have received significant attention due to their practical applicability in real-world scenarios. Traditional black-box methods have generally focused on improving the optimization framework (e.g., utilizing momentum in MI-FGSM) to enhance transferability, rather than examining the dependency on surrogate white-box model architectures. Recent state-of-the-art approach DiffPGD has demonstrated enhanced transferability by employing diffusion-based adversarial purification models for adaptive attacks. The inductive bias of diffusion-based adversarial purification aligns naturally with the adversarial attack process, where both involving noise addition, reducing dependency on surrogate white-box model selection. However, the denoising process of diffusion models incurs substantial computational costs through chain rule derivation, manifested in excessive VRAM consumption and extended runtime. This progression prompts us to question whether introducing diffusion models is necessary. We hypothesize that a model sharing similar inductive bias to diffusion-based adversarial purification, combined with an appropriate loss function, could achieve comparable or superior transferability while dramatically reducing computational overhead. In this paper, we propose a novel loss function coupled with a unique surrogate model to validate our hypothesis. Our approach leverages the score of the time-dependent classifier from classifier-guided diffusion models, effectively incorporating natural data distribution knowledge into the adversarial optimization process. Experimental results demonstrate significantly improved transferability across diverse model architectures while maintaining robustness against diffusion-based defenses.

Paper Structure

This paper contains 27 sections, 15 equations, 4 figures, 13 tables, 2 algorithms.

Figures (4)

  • Figure 1: The above part shows that our method uses less runtime and VRAM compared to the current SOTA method DiffPGD xue2023diffusion, meanwhile having more black-box transferability under protection by diffusion-based purification method. The result on black-box transferability is tested on ResNet101, more details can be found in the experiment section. The bottom part visualizes the capability of our proposed ScorePGD regarding disrupting image editing task. The setting for generating these images is $\gamma=8/255$ and $t=0.5T$ for the diffusion model $D$.
  • Figure 2: The illustration of our method. The $f_\phi(x_t,t )$ is a classifier trained on noised image, with the noise scale up by timestep $t$. We calculate the cross entropy loss $\mathcal{L}_c$, which is optional, and the log-likelihood of the ground truth label simultaneously. The variant with calculating $\mathcal{L}_c$ is U-ScorePGD, the variant without that is ScorePGD. Then we are trying to iteratively maximizing the cross entropy and minimizing the log-likelihood in the optimization process.
  • Figure 3: The illustration of the ScorePGD's objective. Our method aims to change the guidance direction of the reverse process of the diffusion model. The direction of guidance is the score of the time-dependent classifier, and hence our ScorePGD method will induce a distillation with wrong guidance information into the image by the perturbation, leading the diffusion editing purification deal with wrong class.
  • Figure 4: Visualization of the experiment $\ell_2$-based adversarial attack with setting in \ref{['l2 setting']}. (a) Original Image. (b) Adversarial image of DiffPGD. (c) Perturbation of DiffPGD. (d) Adversarial image of U-ScorePGD (Ours). (e) Perturabtion of U-ScorePGD (Ours). We scale up the perturbation's value by five times for better observation.