Table of Contents
Fetching ...

Improving Integrated Gradient-based Transferable Adversarial Examples by Refining the Integration Path

Yuchen Ren, Zhengyu Zhao, Chenhao Lin, Bo Yang, Lu Zhou, Zhe Liu, Chao Shen

TL;DR

The paper addresses the limited transferability of Integrated Gradients (IG)-based transferable adversarial examples by refining the IG integration path along three dimensions: multiplicity (multiple baselines), monotonicity (LBQ-enforced paths), and diversity (input-transformations). It introduces Multiple Monotonic Diversified Integrated Gradients (MuMoDIG), which combines MuIG with monotonic LBQ baselines and diversified paths, further augmented by momentum for transferability. Theoretical analysis clarifies the distinction between IG's interpretability use and its application to attacks, and empirical results on ImageNet show MuMoDIG achieving up to 37.3% higher transferability than MIG and 8.4% over other state-of-the-art attacks across CNNs and ViTs, including defenses and a real-world Baidu Cloud API. This work highlights how adapting established interpretability techniques with principled path design can meaningfully strengthen black-box attack capabilities and informs defenses on the importance of integration-path choices in IG-based methods. $K=10$, $\epsilon=16$, $\alpha=1.6$, $\mu=1.0$ are typical experimental settings used to evaluate performance.

Abstract

Transferable adversarial examples are known to cause threats in practical, black-box attack scenarios. A notable approach to improving transferability is using integrated gradients (IG), originally developed for model interpretability. In this paper, we find that existing IG-based attacks have limited transferability due to their naive adoption of IG in model interpretability. To address this limitation, we focus on the IG integration path and refine it in three aspects: multiplicity, monotonicity, and diversity, supported by theoretical analyses. We propose the Multiple Monotonic Diversified Integrated Gradients (MuMoDIG) attack, which can generate highly transferable adversarial examples on different CNN and ViT models and defenses. Experiments validate that MuMoDIG outperforms the latest IG-based attack by up to 37.3\% and other state-of-the-art attacks by 8.4\%. In general, our study reveals that migrating established techniques to improve transferability may require non-trivial efforts. Code is available at \url{https://github.com/RYC-98/MuMoDIG}.

Improving Integrated Gradient-based Transferable Adversarial Examples by Refining the Integration Path

TL;DR

The paper addresses the limited transferability of Integrated Gradients (IG)-based transferable adversarial examples by refining the IG integration path along three dimensions: multiplicity (multiple baselines), monotonicity (LBQ-enforced paths), and diversity (input-transformations). It introduces Multiple Monotonic Diversified Integrated Gradients (MuMoDIG), which combines MuIG with monotonic LBQ baselines and diversified paths, further augmented by momentum for transferability. Theoretical analysis clarifies the distinction between IG's interpretability use and its application to attacks, and empirical results on ImageNet show MuMoDIG achieving up to 37.3% higher transferability than MIG and 8.4% over other state-of-the-art attacks across CNNs and ViTs, including defenses and a real-world Baidu Cloud API. This work highlights how adapting established interpretability techniques with principled path design can meaningfully strengthen black-box attack capabilities and informs defenses on the importance of integration-path choices in IG-based methods. , , , are typical experimental settings used to evaluate performance.

Abstract

Transferable adversarial examples are known to cause threats in practical, black-box attack scenarios. A notable approach to improving transferability is using integrated gradients (IG), originally developed for model interpretability. In this paper, we find that existing IG-based attacks have limited transferability due to their naive adoption of IG in model interpretability. To address this limitation, we focus on the IG integration path and refine it in three aspects: multiplicity, monotonicity, and diversity, supported by theoretical analyses. We propose the Multiple Monotonic Diversified Integrated Gradients (MuMoDIG) attack, which can generate highly transferable adversarial examples on different CNN and ViT models and defenses. Experiments validate that MuMoDIG outperforms the latest IG-based attack by up to 37.3\% and other state-of-the-art attacks by 8.4\%. In general, our study reveals that migrating established techniques to improve transferability may require non-trivial efforts. Code is available at \url{https://github.com/RYC-98/MuMoDIG}.

Paper Structure

This paper contains 17 sections, 1 theorem, 8 equations, 8 figures, 7 tables.

Key Result

Proposition 1

The integration path should be a Monotonic Integration Path when using integrated gradients to generate adversarial examples in transferable attacks.

Figures (8)

  • Figure 1: Model attribution results for adversarial examples generated by our MuMoDIG vs. MIG mig on the target model PiT-T. MuMoDIG concentrates more on the background, showcasing its superiority in disrupting the model prediction. Here, RN-18 is the surrogate model.
  • Figure 2: (a) MIG mig adopts a single integration path with a black image baseline. Our (b) MuIG adopts multiple integration paths with arbitrary baselines, with (c) MuMoIG further enforcing monotonicity, (d) MuMoDIG$_\textrm{all}$ diversifying paths and keeping all without enforcing their monotonicity, and (e) MuMoDIG removing non-monotonic diversified paths.
  • Figure 3: The influence of the output manifold's high curvature position towards the single integration path.
  • Figure 4: (a) Lower Bound Quantization (LBQ) quantizes all elements in each region to their minimum value, resulting in (b) baseline images that enforce monotonic paths.
  • Figure 5: The cosine similarity calculated between the gradients at 10 interpolated points along a straight path.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 1: Monotonic Integration Path
  • Proposition 1