Table of Contents
Fetching ...

AIM: Additional Image Guided Generation of Transferable Adversarial Attacks

Teng Li, Xingjun Ma, Yu-Gang Jiang

TL;DR

This work tackles the vulnerability of DNNs to transferable adversarial attacks, with a focus on targeted transferability which is harder to achieve. It introduces a plug-and-play Semantic Injection Module (SIM) that injects guiding semantics from an auxiliary image into a general adversarial generator, enabling a two-input generation process $x_{adv}=G((x,x_{guide}),\theta_g)$ and new training losses tailored for targeted and untargeted settings. The key contributions are the SIM architecture, logit-contrastive and mid-layer similarity losses for targeted attacks, an untargeted loss set leveraging multiple guiding images, and extensive cross-architecture and cross-domain evaluations showing substantial improvements in targeted transferability and competitive untargeted performance. The approach advances understanding of semantically guided adversarial generation and has implications for evaluating robustness and informing defense strategies against transferable attacks.

Abstract

Transferable adversarial examples highlight the vulnerability of deep neural networks (DNNs) to imperceptible perturbations across various real-world applications. While there have been notable advancements in untargeted transferable attacks, targeted transferable attacks remain a significant challenge. In this work, we focus on generative approaches for targeted transferable attacks. Current generative attacks focus on reducing overfitting to surrogate models and the source data domain, but they often overlook the importance of enhancing transferability through additional semantics. To address this issue, we introduce a novel plug-and-play module into the general generator architecture to enhance adversarial transferability. Specifically, we propose a \emph{Semantic Injection Module} (SIM) that utilizes the semantics contained in an additional guiding image to improve transferability. The guiding image provides a simple yet effective method to incorporate target semantics from the target class to create targeted and highly transferable attacks. Additionally, we propose new loss formulations that can integrate the semantic injection module more effectively for both targeted and untargeted attacks. We conduct comprehensive experiments under both targeted and untargeted attack settings to demonstrate the efficacy of our proposed approach.

AIM: Additional Image Guided Generation of Transferable Adversarial Attacks

TL;DR

This work tackles the vulnerability of DNNs to transferable adversarial attacks, with a focus on targeted transferability which is harder to achieve. It introduces a plug-and-play Semantic Injection Module (SIM) that injects guiding semantics from an auxiliary image into a general adversarial generator, enabling a two-input generation process and new training losses tailored for targeted and untargeted settings. The key contributions are the SIM architecture, logit-contrastive and mid-layer similarity losses for targeted attacks, an untargeted loss set leveraging multiple guiding images, and extensive cross-architecture and cross-domain evaluations showing substantial improvements in targeted transferability and competitive untargeted performance. The approach advances understanding of semantically guided adversarial generation and has implications for evaluating robustness and informing defense strategies against transferable attacks.

Abstract

Transferable adversarial examples highlight the vulnerability of deep neural networks (DNNs) to imperceptible perturbations across various real-world applications. While there have been notable advancements in untargeted transferable attacks, targeted transferable attacks remain a significant challenge. In this work, we focus on generative approaches for targeted transferable attacks. Current generative attacks focus on reducing overfitting to surrogate models and the source data domain, but they often overlook the importance of enhancing transferability through additional semantics. To address this issue, we introduce a novel plug-and-play module into the general generator architecture to enhance adversarial transferability. Specifically, we propose a \emph{Semantic Injection Module} (SIM) that utilizes the semantics contained in an additional guiding image to improve transferability. The guiding image provides a simple yet effective method to incorporate target semantics from the target class to create targeted and highly transferable attacks. Additionally, we propose new loss formulations that can integrate the semantic injection module more effectively for both targeted and untargeted attacks. We conduct comprehensive experiments under both targeted and untargeted attack settings to demonstrate the efficacy of our proposed approach.
Paper Structure (24 sections, 7 equations, 2 figures, 6 tables)

This paper contains 24 sections, 7 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Our framework introduces a novel semantic injection module (SIM) into the adversarial generator $G \left( \left( \cdot, \cdot \right),\theta_{g} \right)$. The generator takes a source image $x$ and a guiding image $x_{\text{guide}}$ as inputs and outputs an adversarial example $x_{\text{adv}}$. The SIM component utilizes the feature map from the previous layer and the guiding image $x_{\text{guide}}$ to produce an enhanced feature map that incorporates the semantics from the guiding image. For targeted attacks (Tar.), we define the training objectives using logit contrastive loss and mid-layer similarity loss, which direct the adversarial example $x_{\text{adv}}$ towards the target guiding image $x_{\text{guide}}$ in both the logit and feature spaces. For untargeted attacks (Untar.), we introduce an enhanced mid-layer similarity loss to push $x_{\text{adv}}$ away from both the clean image $x$ and the guiding image $x_{\text{guide}}$ in the feature space.
  • Figure 2: Illustration of attention shift. We use Grad-CAM visualization of adversarial examples in the targeted attack setting. The adversarial examples were generated using ResNet-152 as the surrogate model, with evaluations conducted on ResNet-50 as the target model.