Table of Contents
Fetching ...

Enhancing Adversarial Attacks: The Similar Target Method

Shuo Zhang, Ziruo Wang, Zikai Zhou, Huanran Chen

TL;DR

Adversarial examples exhibit strong transferability, threatening deployed models even without access to their internals. The paper introduces the Similar Target (ST) method and its instantiation MI-ST, which regularizes attack optimization by maximizing gradient cosine similarity across surrogate models while preserving the original loss objective. Through a Taylor-based derivation, the authors show MI-ST achieves lower approximation error than prior cosine-based approaches and can be integrated with existing attacks and transformations. Empirical results on ImageNet-related tasks demonstrate substantial gains in transferability against both standard and adversarially trained models, underscoring the need for more robust defenses against gradient-alignment strategies.

Abstract

Deep neural networks are vulnerable to adversarial examples, posing a threat to the models' applications and raising security concerns. An intriguing property of adversarial examples is their strong transferability. Several methods have been proposed to enhance transferability, including ensemble attacks which have demonstrated their efficacy. However, prior approaches simply average logits, probabilities, or losses for model ensembling, lacking a comprehensive analysis of how and why model ensembling significantly improves transferability. In this paper, we propose a similar targeted attack method named Similar Target~(ST). By promoting cosine similarity between the gradients of each model, our method regularizes the optimization direction to simultaneously attack all surrogate models. This strategy has been proven to enhance generalization ability. Experimental results on ImageNet validate the effectiveness of our approach in improving adversarial transferability. Our method outperforms state-of-the-art attackers on 18 discriminative classifiers and adversarially trained models.

Enhancing Adversarial Attacks: The Similar Target Method

TL;DR

Adversarial examples exhibit strong transferability, threatening deployed models even without access to their internals. The paper introduces the Similar Target (ST) method and its instantiation MI-ST, which regularizes attack optimization by maximizing gradient cosine similarity across surrogate models while preserving the original loss objective. Through a Taylor-based derivation, the authors show MI-ST achieves lower approximation error than prior cosine-based approaches and can be integrated with existing attacks and transformations. Empirical results on ImageNet-related tasks demonstrate substantial gains in transferability against both standard and adversarially trained models, underscoring the need for more robust defenses against gradient-alignment strategies.

Abstract

Deep neural networks are vulnerable to adversarial examples, posing a threat to the models' applications and raising security concerns. An intriguing property of adversarial examples is their strong transferability. Several methods have been proposed to enhance transferability, including ensemble attacks which have demonstrated their efficacy. However, prior approaches simply average logits, probabilities, or losses for model ensembling, lacking a comprehensive analysis of how and why model ensembling significantly improves transferability. In this paper, we propose a similar targeted attack method named Similar Target~(ST). By promoting cosine similarity between the gradients of each model, our method regularizes the optimization direction to simultaneously attack all surrogate models. This strategy has been proven to enhance generalization ability. Experimental results on ImageNet validate the effectiveness of our approach in improving adversarial transferability. Our method outperforms state-of-the-art attackers on 18 discriminative classifiers and adversarially trained models.
Paper Structure (14 sections, 1 theorem, 12 equations, 10 figures, 2 tables, 2 algorithms)

This paper contains 14 sections, 1 theorem, 12 equations, 10 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

When $\beta \to 0$, Updating by our algo is equivalent to optimizing: where $\bm{g}_i = \nabla_{\bm{x}} L(f_i({\bm{x}}), {\bm{y}})$, $\lambda_1$ and $\lambda_2$ is the trade-off hyper-parameter setted in our algorithm.

Figures (10)

  • Figure 1: Black-box attack success rate (%) on adversarially trained models by different algorithms. Our method outperforms the existing methods by a large margin. For more detail, refer to \ref{['exp:core_result']}.
  • Figure 2: function $f$
  • Figure 3: function $g$
  • Figure 4: function $f+g$
  • Figure 6: Illustration of our method.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Proof 1
  • Remark 2