Table of Contents
Fetching ...

GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model

Zhiyu Zhu, Huaming Chen, Xinyi Wang, Jiayu Zhang, Zhibo Jin, Kim-Kwang Raymond Choo, Jun Shen, Dong Yuan

TL;DR

This work tackles the challenge of transferring adversarial examples across models in black-box settings by introducing GE-AdvGAN, a gradient-editing mechanism that operates during generator training. The method leverages frequency-domain information via the Discrete Cosine Transform to guide gradient edits, producing highly transferable samples with improved efficiency, as evidenced by large-scale experiments and favorable FPS compared with existing baselines. Core contributions include the gradient-editing framework, a frequency-domain exploration strategy to determine editing directions, and extensive ablations validating parameter effects. The approach advances practical robustness evaluation and demonstrates meaningful gains in attack transferability, offering a valuable tool for studying model vulnerabilities and defenses in real-world scenarios, with open-source replication materials.

Abstract

Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data, i.e., images, text, and audio. Accordingly, its promising performance has led to the GAN-based adversarial attack methods in the white-box and black-box attack scenarios. The importance of transferable black-box attacks lies in their ability to be effective across different models and settings, more closely aligning with real-world applications. However, it remains challenging to retain the performance in terms of transferable adversarial examples for such methods. Meanwhile, we observe that some enhanced gradient-based transferable adversarial attack algorithms require prolonged time for adversarial sample generation. Thus, in this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples whilst improving the algorithm's efficiency. The main approach is via optimising the training process of the generator parameters. With the functional and characteristic similarity analysis, we introduce a novel gradient editing (GE) mechanism and verify its feasibility in generating transferable samples on various models. Moreover, by exploring the frequency domain information to determine the gradient editing direction, GE-AdvGAN can generate highly transferable adversarial samples while minimizing the execution time in comparison to the state-of-the-art transferable adversarial attack algorithms. The performance of GE-AdvGAN is comprehensively evaluated by large-scale experiments on different datasets, which results demonstrate the superiority of our algorithm. The code for our algorithm is available at: https://github.com/LMBTough/GE-advGAN

GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model

TL;DR

This work tackles the challenge of transferring adversarial examples across models in black-box settings by introducing GE-AdvGAN, a gradient-editing mechanism that operates during generator training. The method leverages frequency-domain information via the Discrete Cosine Transform to guide gradient edits, producing highly transferable samples with improved efficiency, as evidenced by large-scale experiments and favorable FPS compared with existing baselines. Core contributions include the gradient-editing framework, a frequency-domain exploration strategy to determine editing directions, and extensive ablations validating parameter effects. The approach advances practical robustness evaluation and demonstrates meaningful gains in attack transferability, offering a valuable tool for studying model vulnerabilities and defenses in real-world scenarios, with open-source replication materials.

Abstract

Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data, i.e., images, text, and audio. Accordingly, its promising performance has led to the GAN-based adversarial attack methods in the white-box and black-box attack scenarios. The importance of transferable black-box attacks lies in their ability to be effective across different models and settings, more closely aligning with real-world applications. However, it remains challenging to retain the performance in terms of transferable adversarial examples for such methods. Meanwhile, we observe that some enhanced gradient-based transferable adversarial attack algorithms require prolonged time for adversarial sample generation. Thus, in this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples whilst improving the algorithm's efficiency. The main approach is via optimising the training process of the generator parameters. With the functional and characteristic similarity analysis, we introduce a novel gradient editing (GE) mechanism and verify its feasibility in generating transferable samples on various models. Moreover, by exploring the frequency domain information to determine the gradient editing direction, GE-AdvGAN can generate highly transferable adversarial samples while minimizing the execution time in comparison to the state-of-the-art transferable adversarial attack algorithms. The performance of GE-AdvGAN is comprehensively evaluated by large-scale experiments on different datasets, which results demonstrate the superiority of our algorithm. The code for our algorithm is available at: https://github.com/LMBTough/GE-advGAN
Paper Structure (35 sections, 24 equations, 2 figures, 3 tables)

This paper contains 35 sections, 24 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: GE-AdvGAN Schematic Diagram (the red line represents the original path, we use the blue $ge$ path instead of the original path.)
  • Figure 2: GE-AdvGAN attack success rate with different parameters