Table of Contents
Fetching ...

Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling

Zenghao Niu, Weicheng Xie, Siyang Song, Zitong Yu, Feng Liu, Linlin Shen

TL;DR

This work tackles adversarial transferability by addressing the Exploitation–Exploration trade-off in black-box settings. It introduces Gradient-Guided Sampling (GGS), an inner-iteration strategy that guides sampling along the gradient from the previous inner-iteration within a neighborhood, using a random-magnitude lookahead to stabilize ascent toward flatter loss regions with higher local maxima. GGS is shown to be compatible with RS-based approaches and input-transformation methods, delivering superior attack transferability across diverse classifiers, multimodal LLMs, and cloud APIs. Comprehensive loss-surface visualizations and extensive experiments demonstrate that GGS achieves a balanced loss landscape—flat enough for cross-model generalization while preserving strong attack potency—thus offering practical, robust improvements for transferable adversarial attacks with modest overhead.

Abstract

Adversarial attacks present a critical challenge to deep neural networks' robustness, particularly in transfer scenarios across different model architectures. However, the transferability of adversarial attacks faces a fundamental dilemma between Exploitation (maximizing attack potency) and Exploration (enhancing cross-model generalization). Traditional momentum-based methods over-prioritize Exploitation, i.e., higher loss maxima for attack potency but weakened generalization (narrow loss surface). Conversely, recent methods with inner-iteration sampling over-prioritize Exploration, i.e., flatter loss surfaces for cross-model generalization but weakened attack potency (suboptimal local maxima). To resolve this dilemma, we propose a simple yet effective Gradient-Guided Sampling (GGS), which harmonizes both objectives through guiding sampling along the gradient ascent direction to improve both sampling efficiency and stability. Specifically, based on MI-FGSM, GGS introduces inner-iteration random sampling and guides the sampling direction using the gradient from the previous inner-iteration (the sampling's magnitude is determined by a random distribution). This mechanism encourages adversarial examples to reside in balanced regions with both flatness for cross-model generalization and higher local maxima for strong attack potency. Comprehensive experiments across multiple DNN architectures and multimodal large language models (MLLMs) demonstrate the superiority of our method over state-of-the-art transfer attacks. Code is made available at https://github.com/anuin-cat/GGS.

Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling

TL;DR

This work tackles adversarial transferability by addressing the Exploitation–Exploration trade-off in black-box settings. It introduces Gradient-Guided Sampling (GGS), an inner-iteration strategy that guides sampling along the gradient from the previous inner-iteration within a neighborhood, using a random-magnitude lookahead to stabilize ascent toward flatter loss regions with higher local maxima. GGS is shown to be compatible with RS-based approaches and input-transformation methods, delivering superior attack transferability across diverse classifiers, multimodal LLMs, and cloud APIs. Comprehensive loss-surface visualizations and extensive experiments demonstrate that GGS achieves a balanced loss landscape—flat enough for cross-model generalization while preserving strong attack potency—thus offering practical, robust improvements for transferable adversarial attacks with modest overhead.

Abstract

Adversarial attacks present a critical challenge to deep neural networks' robustness, particularly in transfer scenarios across different model architectures. However, the transferability of adversarial attacks faces a fundamental dilemma between Exploitation (maximizing attack potency) and Exploration (enhancing cross-model generalization). Traditional momentum-based methods over-prioritize Exploitation, i.e., higher loss maxima for attack potency but weakened generalization (narrow loss surface). Conversely, recent methods with inner-iteration sampling over-prioritize Exploration, i.e., flatter loss surfaces for cross-model generalization but weakened attack potency (suboptimal local maxima). To resolve this dilemma, we propose a simple yet effective Gradient-Guided Sampling (GGS), which harmonizes both objectives through guiding sampling along the gradient ascent direction to improve both sampling efficiency and stability. Specifically, based on MI-FGSM, GGS introduces inner-iteration random sampling and guides the sampling direction using the gradient from the previous inner-iteration (the sampling's magnitude is determined by a random distribution). This mechanism encourages adversarial examples to reside in balanced regions with both flatness for cross-model generalization and higher local maxima for strong attack potency. Comprehensive experiments across multiple DNN architectures and multimodal large language models (MLLMs) demonstrate the superiority of our method over state-of-the-art transfer attacks. Code is made available at https://github.com/anuin-cat/GGS.

Paper Structure

This paper contains 32 sections, 5 equations, 12 figures, 12 tables, 1 algorithm.

Figures (12)

  • Figure 1: The loss surfaces of (a) MI-FGSM dong2018mifgsm (Momentum iterative fast gradient sign method), (b) RS (Base inner-iteration Random Sampling defined in section \ref{['motivation']}) for enhancing exploration within the neighborhood, (c) PGN ge2023pgn (Penalizing Gradient Norm) with Random Sampling (RS) for stable gradient estimation to enhance exploration, and (d) Our GGS (Gradient-Guided Sampling) for efficient sampling to generate gradients that stably towards flat regions with higher local maxima. It shows that, building upon RS, our approach not only maintains a flat loss surface, but also improves the local maximum loss value, compared to PGN, achieving a balance between exploration and exploitation.
  • Figure 2: (a) Outer-Iteration processes of MI-FGSM (green), RS (purple) and our GGS (red). (i) MI-FGSM prioritizes rapid ascent directions, thereby enhancing exploitation and facilitating access to sharp local maxima regions. (ii) RS introduces inner-iterations and enhances exploration capability through neighborhood search, thereby enabling access to flat local maxima regions. (iii) GGS incorporates gradient constraints into RS, simultaneously enhancing both exploration and exploitation capabilities, thereby enabling the faster convergence to the centers of flat local maxima regions. (b) RS is performed within a neighborhood of the current example, and uses the average gradient of all sampled points as the final gradient. (c) Building upon RS, GGS enables examples to have stable gradient directions ( ) toward the centers of flat local maxima regions, following an initial period of brief oscillation ( ). It uses the gradient direction from the previous inner-iteration as the guidance, while the randomness is maintained by setting the gradient magnitude with a random distribution.
  • Figure 3: The differences among three inner-iteration sampling guidance methods: (a) Random sampling in Eq. \ref{['eq:rs']}, with each inner-iteration sampling being independent; (b) Momentum-guided sampling in Eq. \ref{['eq:mrs']}, where sampling direction depends on the cumulative average of all previous gradients, creating long-chain dependencies, while using random sampling for maintaining the randomness; (c) Gradient-guided sampling in Eq. \ref{['eq:ggi']}, where sampling direction relies solely on the gradient direction of the previous iteration, establishing single-step dependencies.
  • Figure 4: The loss surfaces of adversarial examples generated by different methods (MI-FGSM dong2018mifgsm, GRA zhu2023gra, PGN ge2023pgn, ANDA fang2024anda), based on three different model architectures (ResNet50 he2016resnet, Inception-v3 szegedy2016incv3, and ViT-B dosovitskiy2020vit) along random directions with varying strengths like ge2023pgn. In the images marked with a star in the upper left corner, adversarial examples are generated and tested on the same model, indicating white-box testing. In contrast, unmarked images represent adversarial examples generated on one model and tested on another, indicating black-box testing. We have also highlighted the regions covered by our loss surface using a red background for easier visualization.
  • Figure 5: Loss surfaces of adversarial examples generated by different sampling strategies, Random Sampling (RS), MGS and our GGS, with increasing inner-iteration on Resnet50. The red line on the left side of part (a)$\sim$(c) represents the maximum value of the loss surface in each iteration, and the blue dashed line highlight these values in early-stage inner iteration. For (d), it represents the cosine similarity between the gradient $\tilde{g}_{i}$ generated in each inner-iteration and the average of gradients $\sum_{i=1}^N\tilde{g}_{i}/N$.
  • ...and 7 more figures