Table of Contents
Fetching ...

Improving Adversarial Transferability with Neighbourhood Gradient Information

Haijing Guo, Jiafeng Wang, Zhaoyu Chen, Kaixun Jiang, Lingyi Hong, Pinxue Guo, Jinglun Li, Wenqiang Zhang

TL;DR

This paper introduces Neighbourhood Gradient Information (NGI) as a source of highly transferable gradient signals for black-box adversarial attacks. It proposes NGI-Attack with two mechanisms—Example Backtracking to accumulate NGI and Multiplex Mask to diversify gradient information across non-discriminative regions—delivering high transferability without extra computation. Empirical results on ImageNet show substantial improvements across single-model, ensemble-model, and defense-model settings, including strong performance against robust defenses like RS and DiffPure. The approach is plug-and-play with existing methods and highlights critical considerations for evaluating and strengthening model robustness against transferable adversarial examples.

Abstract

Deep neural networks (DNNs) are known to be susceptible to adversarial examples, leading to significant performance degradation. In black-box attack scenarios, a considerable attack performance gap between the surrogate model and the target model persists. This work focuses on enhancing the transferability of adversarial examples to narrow this performance gap. We observe that the gradient information around the clean image, i.e., Neighbourhood Gradient Information (NGI), can offer high transferability.Based on this insight, we introduce NGI-Attack, incorporating Example Backtracking and Multiplex Mask strategies to exploit this gradient information and enhance transferability. Specifically, we first adopt Example Backtracking to accumulate Neighbourhood Gradient Information as the initial momentum term. Then, we utilize Multiplex Mask to form a multi-way attack strategy that forces the network to focus on non-discriminative regions, which can obtain richer gradient information during only a few iterations. Extensive experiments demonstrate that our approach significantly enhances adversarial transferability. Especially, when attacking numerous defense models, we achieve an average attack success rate of 95.2%. Notably, our method can seamlessly integrate with any off-the-shelf algorithm, enhancing their attack performance without incurring extra time costs.

Improving Adversarial Transferability with Neighbourhood Gradient Information

TL;DR

This paper introduces Neighbourhood Gradient Information (NGI) as a source of highly transferable gradient signals for black-box adversarial attacks. It proposes NGI-Attack with two mechanisms—Example Backtracking to accumulate NGI and Multiplex Mask to diversify gradient information across non-discriminative regions—delivering high transferability without extra computation. Empirical results on ImageNet show substantial improvements across single-model, ensemble-model, and defense-model settings, including strong performance against robust defenses like RS and DiffPure. The approach is plug-and-play with existing methods and highlights critical considerations for evaluating and strengthening model robustness against transferable adversarial examples.

Abstract

Deep neural networks (DNNs) are known to be susceptible to adversarial examples, leading to significant performance degradation. In black-box attack scenarios, a considerable attack performance gap between the surrogate model and the target model persists. This work focuses on enhancing the transferability of adversarial examples to narrow this performance gap. We observe that the gradient information around the clean image, i.e., Neighbourhood Gradient Information (NGI), can offer high transferability.Based on this insight, we introduce NGI-Attack, incorporating Example Backtracking and Multiplex Mask strategies to exploit this gradient information and enhance transferability. Specifically, we first adopt Example Backtracking to accumulate Neighbourhood Gradient Information as the initial momentum term. Then, we utilize Multiplex Mask to form a multi-way attack strategy that forces the network to focus on non-discriminative regions, which can obtain richer gradient information during only a few iterations. Extensive experiments demonstrate that our approach significantly enhances adversarial transferability. Especially, when attacking numerous defense models, we achieve an average attack success rate of 95.2%. Notably, our method can seamlessly integrate with any off-the-shelf algorithm, enhancing their attack performance without incurring extra time costs.
Paper Structure (23 sections, 14 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 23 sections, 14 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: Two toy experiments using Inception-v3 inceptionv2 as the surrogate model: (a) examines the impact of accumulating Neighborhood Gradient Information, and (b) validates the effects of direct amplification steps on attack performance.
  • Figure 2: Visualization of the attention of adversarial examples under different strategies with Grad-Cam. The top row, the middle row, and the bottom row represent the attention distribution of the clean image, the generated adversarial examples under the scale-up step strategy, and the generated adversarial examples under our strategy, respectively, under different defense models. The results demonstrate that our strategy successfully achieves adversarial attacks by effectively disrupting the network's attention, while the scale-up strategy tends to maintain regions of interest similar to those in the clean images across various defense models.
  • Figure 3: The overall NGI-Attack pipeline: The process begins using the Example Backtracking strategy to collect Neighbourhood Gradient Information. This information is then utilized in the Multiplex Mask strategy, forming a multi-way attack. One pathway applies the MaskProcess operation, while the other directly utilizes the clean image. The gradients of these two pathways are collected independently, guided by the Neighbourhood Gradient Information, ultimately generating a highly transferable adversarial example.
  • Figure 4: Visualization of the attention of MaskProcess images with towards Inception-v3 inceptionv2. Each set comprises two rows; the first represents the clean image and its attention visualization, while the second represents the MaskProcess image with $P$ of 0.9 and its attention visualization. The results show that the MaskProcess image can successfully shift the network's attention towards non-discriminative regions.
  • Figure 5: Attack success rate (%) of input transformation-based attacks on seven models in the single-model setting. The adversarial examples are crafted on Inc-v3.
  • ...and 1 more figures