Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics
Shuai Li, Xiaoyu Jiang, Xiaoguang Ma
TL;DR
This work tackles the brittleness of conventional adversarial examples crafted with $\,\ell_p$-norm constraints by proposing MAELS, a manifold-aided framework that generates adversarial examples with legitimate semantics along a supervised semantic-transformation manifold. Using a two-stage SSTGM, MAELS learns a disentangled latent space and traverses a continuous semantic path to produce $x^*$ from $x$ without relying on pixel-space perturbations, achieving superior visual quality and transferability. Empirical results on MNIST and DEFECT show MAELS attains high ATR values while maintaining low perceptual loss and can even bypass known defenses like adversarial training and network distillation, revealing meaningful interpretability through continuous semantic transitions. The approach offers a principled, transferable, and interpretable way to study model vulnerabilities and robust- ness, with potential applications in security evaluation and semantic-aware attack generation.
Abstract
Deep neural networks were significantly vulnerable to adversarial examples manipulated by malicious tiny perturbations. Although most conventional adversarial attacks ensured the visual imperceptibility between adversarial examples and corresponding raw images by minimizing their geometric distance, these constraints on geometric distance led to limited attack transferability, inferior visual quality, and human-imperceptible interpretability. In this paper, we proposed a supervised semantic-transformation generative model to generate adversarial examples with real and legitimate semantics, wherein an unrestricted adversarial manifold containing continuous semantic variations was constructed for the first time to realize a legitimate transition from non-adversarial examples to adversarial ones. Comprehensive experiments on MNIST and industrial defect datasets showed that our adversarial examples not only exhibited better visual quality but also achieved superior attack transferability and more effective explanations for model vulnerabilities, indicating their great potential as generic adversarial examples. The code and pre-trained models were available at https://github.com/shuaili1027/MAELS.git.
