Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

Shuai Li; Xiaoyu Jiang; Xiaoguang Ma

Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

Shuai Li, Xiaoyu Jiang, Xiaoguang Ma

TL;DR

This work tackles the brittleness of conventional adversarial examples crafted with $\,\ell_p$-norm constraints by proposing MAELS, a manifold-aided framework that generates adversarial examples with legitimate semantics along a supervised semantic-transformation manifold. Using a two-stage SSTGM, MAELS learns a disentangled latent space and traverses a continuous semantic path to produce $x^*$ from $x$ without relying on pixel-space perturbations, achieving superior visual quality and transferability. Empirical results on MNIST and DEFECT show MAELS attains high ATR values while maintaining low perceptual loss and can even bypass known defenses like adversarial training and network distillation, revealing meaningful interpretability through continuous semantic transitions. The approach offers a principled, transferable, and interpretable way to study model vulnerabilities and robust- ness, with potential applications in security evaluation and semantic-aware attack generation.

Abstract

Deep neural networks were significantly vulnerable to adversarial examples manipulated by malicious tiny perturbations. Although most conventional adversarial attacks ensured the visual imperceptibility between adversarial examples and corresponding raw images by minimizing their geometric distance, these constraints on geometric distance led to limited attack transferability, inferior visual quality, and human-imperceptible interpretability. In this paper, we proposed a supervised semantic-transformation generative model to generate adversarial examples with real and legitimate semantics, wherein an unrestricted adversarial manifold containing continuous semantic variations was constructed for the first time to realize a legitimate transition from non-adversarial examples to adversarial ones. Comprehensive experiments on MNIST and industrial defect datasets showed that our adversarial examples not only exhibited better visual quality but also achieved superior attack transferability and more effective explanations for model vulnerabilities, indicating their great potential as generic adversarial examples. The code and pre-trained models were available at https://github.com/shuaili1027/MAELS.git.

Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

TL;DR

This work tackles the brittleness of conventional adversarial examples crafted with

-norm constraints by proposing MAELS, a manifold-aided framework that generates adversarial examples with legitimate semantics along a supervised semantic-transformation manifold. Using a two-stage SSTGM, MAELS learns a disentangled latent space and traverses a continuous semantic path to produce

from

without relying on pixel-space perturbations, achieving superior visual quality and transferability. Empirical results on MNIST and DEFECT show MAELS attains high ATR values while maintaining low perceptual loss and can even bypass known defenses like adversarial training and network distillation, revealing meaningful interpretability through continuous semantic transitions. The approach offers a principled, transferable, and interpretable way to study model vulnerabilities and robust- ness, with potential applications in security evaluation and semantic-aware attack generation.

Abstract

Paper Structure (16 sections, 8 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 8 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Adversarial Examples (AEs)
Generative Adversarial Networks (GAN)
Methodology
Problem Definition
Supervised Semantic-Transformation Generative Model (SSTGM)
Manifold-aided Adversarial Examples with Legitimate Semantics (MAELS)
Experiments
Experimental Setup
Demonstration of Transitions Towards Adversarial
Comparison Studies on Visual Quality
Comparison Studies on Attack Transferability
Interpretable network vulnerabilities
Breaking Known Defenses
...and 1 more sections

Figures (6)

Figure 1: Demonstration of two types of AEs, including six control sets from MNIST and DEFECT datasets. For each control set, the leftmost image was the raw image, the middle one showcased AE with unstructured perturbations (AEUP) generated by the well-known adversarial technique, PGD madry2017towards, and the rightmost one was the MAELS. The average $\ell_2$-norm distance obtained by our method was set to be the perturbation size $\Delta$ of PGD (15.86 for MNIST and 4.49 for DEFECT) to ensure the same adversarial strength. Green and red squares and texts were correct and incorrect classifications under well-trained MobileNetV2, respectively.
Figure 2: An overview of SSTGM and MAELS framework. $G$, $D$, $Q$, $E$, and $\cal{F}$ represented the networks of the conditional generator, discriminator, auxiliary decoder, encoder, and target victim model to be attacked, respectively. Green and red squares marked examples that were correctly predicted by $\cal{F}$ and incorrect ones respectively. $x$ and $x'$ represented raw images and generated examples by $G$ respectively. $x'$ contained reconstruction images $x^R$ and semantic variant $x^{SO}$. In the second stage, AEs $x^*$ were contained in $x^{SO}$. $Q(z_2|x^\prime)$, $Q(z_3|x^\prime)$, and ${\cal{F}}(z_2|x^\prime)$ represented disentanglement $z_2$ and $z_3$ of $Q$ and query results for $x^\prime$ from $\cal{F}$ respectively. Each stage was divided into two parts, i.e., the training and testing. In each testing stage, the well-trained $G$ was used to test the generation of a continuous manifold.
Figure 3: Transitions towards adversarial on manifolds with continuous semantics. By manipulating the two dimensions (1 and 2) of the semantic code representation, denoted as $z_3$, we presented $5 \times 5$ image matrices on the left side. On the right side, we generated corresponding $5 \times 5$ heatmaps utilizing the evaluation of MobileNetV2. These heatmaps consisted of the prediction results and associated confidences. The images correctly classified by MobileNetV2 (non-adversarial) were highlighted with green squares, while the misclassified ones (adversarial) were marked with red squares.
Figure 4: Visual quality of AEs generated by unstructured perturbations and MAELS, wherein the latter stemmed from PGD adversary. We determined $\Delta$ of the PGD method by calculating the $\ell_2$-norm distance between the MAELS and corresponding raw images. $\Delta$ of the MAELS were $15.86$ and $4.49$ on MNIST and DEFECT datasets, respectively.
Figure 5: Continuous transitions created along a specific dimension of $z_3$. Red marked out incorrect classifications under well-trained models. Meanwhile, we also gave predicted labels and confidence scores under different models for each group of semantic variants.
...and 1 more figures

Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

TL;DR

Abstract

Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics

Authors

TL;DR

Abstract

Table of Contents

Figures (6)