Table of Contents
Fetching ...

Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection

Youheng Sun, Shengming Yuan, Xuanhan Wang, Lianli Gao, Jingkuan Song

TL;DR

The paper tackles the problem of generalized targeted adversarial attacks that can mislead DNNs toward any target class, including unseen ones. It proposes GAKer, a generalized adversarial attacker that performs latent infection by injecting target object components into the latent space of a clean image via a UNet-based generator and cosine-feature losses in a frozen extractor space. Key contributions include firstly enabling attacks on unknown targets with substantially improved success rates (e.g., approximately $14.13\%$ higher for unknown classes and $4.23\%$ for known classes over prior generator-based methods), and secondly demonstrating effectiveness across standard CNNs, adversarially trained models, and large vision-language models, with efficient training relative to prior methods. The approach provides a practical tool for broad vulnerability assessment and reveals pervasive weaknesses in contemporary DNNs and LVLMs, underscoring security implications for real-world systems.

Abstract

Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn to attack known target classes, they cannot generalize well to unknown classes. To tackle this issue, we propose $\bf{G}$eneralized $\bf{A}$dversarial attac$\bf{KER}$ ($\bf{GAKer}$), which is able to construct adversarial examples to any target class. The core idea behind GAKer is to craft a latently infected representation during adversarial example generation. To this end, the extracted latent representations of the target object are first injected into intermediate features of an input image in an adversarial generator. Then, the generator is optimized to ensure visual consistency with the input image while being close to the target object in the feature space. Since the GAKer is class-agnostic yet model-agnostic, it can be regarded as a general tool that not only reveals the vulnerability of more DNNs but also identifies deficiencies of DNNs in a wider range of classes. Extensive experiments have demonstrated the effectiveness of our proposed method in generating adversarial examples for both known and unknown classes. Notably, compared with other generative methods, our method achieves an approximately $14.13\%$ higher attack success rate for unknown classes and an approximately $4.23\%$ higher success rate for known classes. Our code is available in https://github.com/VL-Group/GAKer.

Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection

TL;DR

The paper tackles the problem of generalized targeted adversarial attacks that can mislead DNNs toward any target class, including unseen ones. It proposes GAKer, a generalized adversarial attacker that performs latent infection by injecting target object components into the latent space of a clean image via a UNet-based generator and cosine-feature losses in a frozen extractor space. Key contributions include firstly enabling attacks on unknown targets with substantially improved success rates (e.g., approximately higher for unknown classes and for known classes over prior generator-based methods), and secondly demonstrating effectiveness across standard CNNs, adversarially trained models, and large vision-language models, with efficient training relative to prior methods. The approach provides a practical tool for broad vulnerability assessment and reveals pervasive weaknesses in contemporary DNNs and LVLMs, underscoring security implications for real-world systems.

Abstract

Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn to attack known target classes, they cannot generalize well to unknown classes. To tackle this issue, we propose eneralized dversarial attac (), which is able to construct adversarial examples to any target class. The core idea behind GAKer is to craft a latently infected representation during adversarial example generation. To this end, the extracted latent representations of the target object are first injected into intermediate features of an input image in an adversarial generator. Then, the generator is optimized to ensure visual consistency with the input image while being close to the target object in the feature space. Since the GAKer is class-agnostic yet model-agnostic, it can be regarded as a general tool that not only reveals the vulnerability of more DNNs but also identifies deficiencies of DNNs in a wider range of classes. Extensive experiments have demonstrated the effectiveness of our proposed method in generating adversarial examples for both known and unknown classes. Notably, compared with other generative methods, our method achieves an approximately higher attack success rate for unknown classes and an approximately higher success rate for known classes. Our code is available in https://github.com/VL-Group/GAKer.
Paper Structure (22 sections, 5 equations, 18 figures, 6 tables)

This paper contains 22 sections, 5 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: The inference process of different generator-based targeted attacks. In the center of each scenario is the generator $\mathcal{G}$ that takes the model inputs at the top and aims to produce adversarial examples $\bm{x'}$ at the bottom. The classes indicated within $\mathcal{G}$ (cat, dog, fish, bird) are the training classes. Blue lines denote known classes that are encountered during inference, and yellow lines denote unknown classes that were not present in the training data. Sub-figure (a) depicts a single-target attack where each generator is specialized for one class, thus can only attack that specific class. Sub-figure (b) demonstrates a multiple-target attack where the generator $\mathcal{G}$ takes a source image $\bm{x}$ and known target labels (e.g., cat, dog) to create their adversarial examples $\bm{x'}$, but it fails to attack labels unknown to the training (e.g., fish, bird). Sub-figure (c) represents an arbitrary-target attack where $\mathcal{G}$ can utilize target images to craft adversarial examples capable of misleading the classifier into known and unknown classes (e.g., fish, bird), highlighting the generalization capability of this approach.
  • Figure 2: The pipeline of our GAKer. We propose a generator-based method, GAKer, that can achieve attacks even when targeting classes unseen during training. During the training phase, we extract target features $f_t$ through a frozen $\mathcal{F}_\psi$ and inject them into the generator $\mathcal{G}_\theta$, then use $Clip(\cdot)$ to constrain $x'_s$ within the perturbation budget. $\mathcal{G}_\theta$ aims to minimize the cosine similarity between $f'_s$ and $f_t$, as well as between $f_\delta$ and $f_t$. Due to our training strategy built on the feature distribution independent of the training classes, our generator can generate adversarial examples $x'_s$ for unknown classes to attack the victim model.
  • Figure 3: Schematic diagram of feature insertion into each ResBlock of a UNet. The features are first transformed by the Feature Transform Module (FTM), followed by dimension matching through the Dimension Matching Module (DMM) layers before being integrated into each ResBlock of the UNet.
  • Figure 4: Comparison of targeted attack success rates across a range of known classes. The Res-50 serves as the substitute model, while the performance of black-box models, including Res-152, VGG-19, and Dense-121, is evaluated for both known (K) and unknown (U) classes.
  • Figure 5: Comparison of targeted attack transfer success rates under different known classes selection strategies.
  • ...and 13 more figures