Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation

Wenyuan Wu; Zheng Liu; Yong Chen; Chao Su; Dezhong Peng; Xu Wang

Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation

Wenyuan Wu, Zheng Liu, Yong Chen, Chao Su, Dezhong Peng, Xu Wang

TL;DR

By diversifying gradients, IKD enables the generation of adversarial samples with superior generalization capabilities across different models, significantly enhancing their effectiveness in black-box attack scenarios.

Abstract

In recent years, the rapid development of deep neural networks has brought increased attention to the security and robustness of these models. While existing adversarial attack algorithms have demonstrated success in improving adversarial transferability, their performance remains suboptimal due to a lack of consideration for the discrepancies between target and source models. To address this limitation, we propose a novel method, Inverse Knowledge Distillation (IKD), designed to enhance adversarial transferability effectively. IKD introduces a distillation-inspired loss function that seamlessly integrates with gradient-based attack methods, promoting diversity in attack gradients and mitigating overfitting to specific model architectures. By diversifying gradients, IKD enables the generation of adversarial samples with superior generalization capabilities across different models, significantly enhancing their effectiveness in black-box attack scenarios. Extensive experiments on the ImageNet dataset validate the effectiveness of our approach, demonstrating substantial improvements in the transferability and attack success rates of adversarial samples across a wide range of models.

Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation

TL;DR

Abstract

Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)