Table of Contents
Fetching ...

Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation

Wenyuan Wu, Zheng Liu, Yong Chen, Chao Su, Dezhong Peng, Xu Wang

TL;DR

By diversifying gradients, IKD enables the generation of adversarial samples with superior generalization capabilities across different models, significantly enhancing their effectiveness in black-box attack scenarios.

Abstract

In recent years, the rapid development of deep neural networks has brought increased attention to the security and robustness of these models. While existing adversarial attack algorithms have demonstrated success in improving adversarial transferability, their performance remains suboptimal due to a lack of consideration for the discrepancies between target and source models. To address this limitation, we propose a novel method, Inverse Knowledge Distillation (IKD), designed to enhance adversarial transferability effectively. IKD introduces a distillation-inspired loss function that seamlessly integrates with gradient-based attack methods, promoting diversity in attack gradients and mitigating overfitting to specific model architectures. By diversifying gradients, IKD enables the generation of adversarial samples with superior generalization capabilities across different models, significantly enhancing their effectiveness in black-box attack scenarios. Extensive experiments on the ImageNet dataset validate the effectiveness of our approach, demonstrating substantial improvements in the transferability and attack success rates of adversarial samples across a wide range of models.

Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation

TL;DR

By diversifying gradients, IKD enables the generation of adversarial samples with superior generalization capabilities across different models, significantly enhancing their effectiveness in black-box attack scenarios.

Abstract

In recent years, the rapid development of deep neural networks has brought increased attention to the security and robustness of these models. While existing adversarial attack algorithms have demonstrated success in improving adversarial transferability, their performance remains suboptimal due to a lack of consideration for the discrepancies between target and source models. To address this limitation, we propose a novel method, Inverse Knowledge Distillation (IKD), designed to enhance adversarial transferability effectively. IKD introduces a distillation-inspired loss function that seamlessly integrates with gradient-based attack methods, promoting diversity in attack gradients and mitigating overfitting to specific model architectures. By diversifying gradients, IKD enables the generation of adversarial samples with superior generalization capabilities across different models, significantly enhancing their effectiveness in black-box attack scenarios. Extensive experiments on the ImageNet dataset validate the effectiveness of our approach, demonstrating substantial improvements in the transferability and attack success rates of adversarial samples across a wide range of models.

Paper Structure

This paper contains 25 sections, 7 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: The distinction between Inverse Knowledge Distillation (IKD) method and existing gradient-based attack approaches. The blue arrow indicates the gradient direction of traditional gradient-based attack methods, which are heavily reliant on the decision boundary of the surrogate model (i.e., closely correlated with the parameters of the surrogate model). As a result, these methods may fail to effectively attack the target model, as they often cannot generate adversarial samples capable of crossing the decision boundary of the target model, represented by the orange curve. In contrast, by optimizing the attack gradient direction (indicated by the orange dashed line), our IKD method not only ensures that the adversarial sample successfully crosses the surrogate model's decision boundary, but also increases the likelihood of the adversarial sample overcoming the target model's decision boundary. Consequently, our IKD method demonstrates a higher success rate in transfer-based attacks.
  • Figure 2: Effect of different IKD methods on the transfer-based average attack success rate.
  • Figure 3: Effect of IKD weight on the transfer-based average attack success rate.