Table of Contents
Fetching ...

Towards Cross-Domain Multi-Targeted Adversarial Attacks

Taïga Gonçalves, Tomo Miyazaki, Shinichiro Omachi

TL;DR

CD-MTA tackles cross-domain targeted adversarial attacks without access to the victim's training data by conditioning perturbations on a single target image and guiding the generator with class-agnostic feature objectives. It introduces a Feature Injection Module (FIM) that blends source and target features using SPADE, and it enforces a dual objective: L_feat to align intermediate features and L_fr to reconstruct target features, formalized through a loss like min_delta ell( f(x_s+delta), f(x_t) ) with a budget constraint on ||delta||. The approach eliminates data leakage and demonstrates state-of-the-art performance on unseen target classes across ImageNet and seven additional datasets, including cross-domain transfers, without target-domain training. These findings reveal critical security concerns for privately trained models and motivate the development of stronger defenses against leakage-free, cross-domain targeted attacks.

Abstract

Multi-targeted adversarial attacks aim to mislead classifiers toward specific target classes using a single perturbation generator with a conditional input specifying the desired target class. Existing methods face two key limitations: (1) a single generator supports only a limited number of predefined target classes, and (2) it requires access to the victim model's training data to learn target class semantics. This dependency raises data leakage concerns in practical black-box scenarios where the training data is typically private. To address these limitations, we propose a novel Cross-Domain Multi-Targeted Attack (CD-MTA) that can generate perturbations toward arbitrary target classes, even those that do not exist in the attacker's training data. CD-MTA is trained on a single public dataset but can perform targeted attacks on black-box models trained on different datasets with disjoint and unknown class sets. Our method requires only a single example image that visually represents the desired target class, without relying its label, class distribution or pretrained embeddings. We achieve this through a Feature Injection Module (FIM) and class-agnostic objectives which guide the generator to extract transferable, fine-grained features from the target image without inferring class semantics. Experiments on ImageNet and seven additional datasets show that CD-MTA outperforms existing multi-targeted attack methods on unseen target classes in black-box and cross-domain scenarios. The code is available at https://github.com/tgoncalv/CD-MTA.

Towards Cross-Domain Multi-Targeted Adversarial Attacks

TL;DR

CD-MTA tackles cross-domain targeted adversarial attacks without access to the victim's training data by conditioning perturbations on a single target image and guiding the generator with class-agnostic feature objectives. It introduces a Feature Injection Module (FIM) that blends source and target features using SPADE, and it enforces a dual objective: L_feat to align intermediate features and L_fr to reconstruct target features, formalized through a loss like min_delta ell( f(x_s+delta), f(x_t) ) with a budget constraint on ||delta||. The approach eliminates data leakage and demonstrates state-of-the-art performance on unseen target classes across ImageNet and seven additional datasets, including cross-domain transfers, without target-domain training. These findings reveal critical security concerns for privately trained models and motivate the development of stronger defenses against leakage-free, cross-domain targeted attacks.

Abstract

Multi-targeted adversarial attacks aim to mislead classifiers toward specific target classes using a single perturbation generator with a conditional input specifying the desired target class. Existing methods face two key limitations: (1) a single generator supports only a limited number of predefined target classes, and (2) it requires access to the victim model's training data to learn target class semantics. This dependency raises data leakage concerns in practical black-box scenarios where the training data is typically private. To address these limitations, we propose a novel Cross-Domain Multi-Targeted Attack (CD-MTA) that can generate perturbations toward arbitrary target classes, even those that do not exist in the attacker's training data. CD-MTA is trained on a single public dataset but can perform targeted attacks on black-box models trained on different datasets with disjoint and unknown class sets. Our method requires only a single example image that visually represents the desired target class, without relying its label, class distribution or pretrained embeddings. We achieve this through a Feature Injection Module (FIM) and class-agnostic objectives which guide the generator to extract transferable, fine-grained features from the target image without inferring class semantics. Experiments on ImageNet and seven additional datasets show that CD-MTA outperforms existing multi-targeted attack methods on unseen target classes in black-box and cross-domain scenarios. The code is available at https://github.com/tgoncalv/CD-MTA.

Paper Structure

This paper contains 22 sections, 9 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Illustration of the data leakage problem in existing cross-domain targeted attacksAdvAtt_TTPAdvAtt_TTAAAdvAtt_CD-APAdvAtt_CGNCAdvAtt_FDAbis. These methods are traditionally considered cross-domain as long as the input images $x_s$ of the perturbation generator are from a different dataset between training and evaluation. However, existing methods still use private information from the target dataset during training---such as ground-truth labels and pretrained white-box classifiers---leading to unintended data leakage. As a result, these attacks are bound to a specific source-target dataset pair and cannot be directly applied to other datasets without retraining. This contradicts the initial goal of cross-domain attacks, which is to generalize to unknown datasets. In contrast, our method eliminates data leakage by removing all components and losses that depend on the target dataset.
  • Figure 2: Overview of CD-MTA. The Feature Injection Module (FIM) merges the source and target images and produce a perturbation $\delta$. Then, an adversarial example $x_s'$ is formed using the projection function $P(\cdot,\epsilon)$, as in \ref{['eq:perturbed_img_formula']}. During training, a white-box classifier extracts the intermediate feature maps and computes the feature loss $L_{feat}$, while the encoder $E$ is reused to compute the feature reconstruction loss $L_{fr}$.
  • Figure 3: Feature Injection Module (FIM): the target information is injected into the source using SPADE normalization SPADE.
  • Figure 4: Illustration of the Feature Reconstruction Objective (FRO). The perturbation generator is trained to reconstruct the source data $x_s$ in the visual space using $P(\cdot, \epsilon)$ as in \ref{['eq:perturbed_img_formula']}, while simultaneously reconstructing the target data $z_t^G$ in the feature space using the feature reconstruction loss $L_{fr}$ in \ref{['eq:FRO']}.
  • Figure 5: Visualization of adversarial examples generated by CD-MTA using unseen ImageNet classes. Each subfigure shows the source, target and pertubed images (top row), with their corresponding feature maps extracted from VGG19 and averaged across channels (bottom row). The perturbed images are aligned with the target images in the feature space, demonstrating the ability of CD-MTA to generate targeted adversarial examples without relying on class semantics.
  • ...and 5 more figures