Table of Contents
Fetching ...

Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN

Roie Kazoom, Alon Goldberg, Hodaya Cohen, Ofer Hadar

TL;DR

This work presents a targeted, realism-aware adversarial patch synthesis framework that operates under strict black-box conditions. By conditioning the patch generator on real input images and guiding patch placement with Grad-CAM from a surrogate model, the method achieves precise target misclassification while maintaining visual plausibility. A multi-objective loss combining adversarial, patch-consistency, and perceptual terms enables high attack success and robust realism across CNNs and Vision Transformers, with strong transferability and defense robustness. The approach exposes practical vulnerabilities in modern vision systems and establishes a new benchmark for realistic, context-aware adversarial patches.

Abstract

Adversarial patch attacks pose a severe threat to deep neural networks, yet most existing approaches rely on unrealistic white-box assumptions, untargeted objectives, or produce visually conspicuous patches that limit real-world applicability. In this work, we introduce a novel framework for fully controllable adversarial patch generation, where the attacker can freely choose both the input image x and the target class y target, thereby dictating the exact misclassification outcome. Our method combines a generative U-Net design with Grad-CAM-guided patch placement, enabling semantic-aware localization that maximizes attack effectiveness while preserving visual realism. Extensive experiments across convolutional networks (DenseNet-121, ResNet-50) and vision transformers (ViT-B/16, Swin-B/16, among others) demonstrate that our approach achieves state-of-the-art performance across all settings, with attack success rates (ASR) and target-class success (TCS) consistently exceeding 99%. Importantly, we show that our method not only outperforms prior white-box attacks and untargeted baselines, but also surpasses existing non-realistic approaches that produce detectable artifacts. By simultaneously ensuring realism, targeted control, and black-box applicability-the three most challenging dimensions of patch-based attacks-our framework establishes a new benchmark for adversarial robustness research, bridging the gap between theoretical attack strength and practical stealthiness.

Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN

TL;DR

This work presents a targeted, realism-aware adversarial patch synthesis framework that operates under strict black-box conditions. By conditioning the patch generator on real input images and guiding patch placement with Grad-CAM from a surrogate model, the method achieves precise target misclassification while maintaining visual plausibility. A multi-objective loss combining adversarial, patch-consistency, and perceptual terms enables high attack success and robust realism across CNNs and Vision Transformers, with strong transferability and defense robustness. The approach exposes practical vulnerabilities in modern vision systems and establishes a new benchmark for realistic, context-aware adversarial patches.

Abstract

Adversarial patch attacks pose a severe threat to deep neural networks, yet most existing approaches rely on unrealistic white-box assumptions, untargeted objectives, or produce visually conspicuous patches that limit real-world applicability. In this work, we introduce a novel framework for fully controllable adversarial patch generation, where the attacker can freely choose both the input image x and the target class y target, thereby dictating the exact misclassification outcome. Our method combines a generative U-Net design with Grad-CAM-guided patch placement, enabling semantic-aware localization that maximizes attack effectiveness while preserving visual realism. Extensive experiments across convolutional networks (DenseNet-121, ResNet-50) and vision transformers (ViT-B/16, Swin-B/16, among others) demonstrate that our approach achieves state-of-the-art performance across all settings, with attack success rates (ASR) and target-class success (TCS) consistently exceeding 99%. Importantly, we show that our method not only outperforms prior white-box attacks and untargeted baselines, but also surpasses existing non-realistic approaches that produce detectable artifacts. By simultaneously ensuring realism, targeted control, and black-box applicability-the three most challenging dimensions of patch-based attacks-our framework establishes a new benchmark for adversarial robustness research, bridging the gap between theoretical attack strength and practical stealthiness.

Paper Structure

This paper contains 24 sections, 32 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overall attack pipeline. Given an input image $x$, we first extract Grad-CAM heatmaps from a surrogate ResNet-50 to localize semantically salient regions. A U-Net generator $G$ consumes the seed patch $\delta$ to synthesize an adversarial patch $G(\delta)$. The patch is placed on $x$ to form $x_{\mathrm{adv}}$, which is then fed to the black-box victim model. We jointly optimize three losses: (1) adversarial loss $L_{\mathrm{adv}}=-\log P(y_{\mathrm{target}}\mid x\oplus G(\delta))$, (2) pixel-level perceptual loss $L_{\mathrm{patch}}=\mathbb{E}_{\delta}\|G(\delta)-\delta\|_{2}$, and (3) deep feature consistency loss $L_{\mathrm{perc}}=\|\phi(G(\delta))-\phi(\delta)\|_{2}$ via a frozen VGG16.
  • Figure 2: Targeted adversarial patch attack framework. An input image $x$ is overlaid with an attacker-chosen patch $\delta$ (highlighted in blue), producing an adversarial example $x \oplus \delta$. The adversarial input is passed to a black-box model, which is forced to predict an attacker-specified target class $y_{target}$ (highlighted in red). This figure emphasizes the two degrees of attacker control: (1) designing the adversarial patch $\delta$, and (2) selecting the desired misclassification target $y_{target}$. Arrows indicate the attack flow from clean input to adversarial output.
  • Figure 3: Comparison of adversarial patch attacks with a patch size of $64 \times 64$. The plots show Attack Success Rate (ASR) and Target-Class Success (TCS), both reported in percentage. For each method, the scatter marker represents the measured value, while the badges below indicate the attack properties: Targeted, Realistic, and Black-box. Our method combines all three challenging properties simultaneously and still achieves the strongest performance across both metrics, highlighting robustness under the most difficult attack setting.
  • Figure 4: Adversarial patch examples. Left: clean input images. Right: realistic texture-preserving adversarial patches generated by our method, which achieve targeted attacks without significantly altering the visual content.
  • Figure 5: Effect of patch size on attack success rate (ASR) and target-class success (TCS) for ResNet on ImageNet. Both ASR and TCS increase with patch size, converging to nearly $100\%$ success for $64\times 64$ and larger patches.