Table of Contents
Fetching ...

AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation

Heqi Peng, Yunhong Wang, Ruijie Yang, Beichen Li, Rui Wang, Yuanfang Guo

TL;DR

This work tackles the critical problem of poor generalization in adversarial example detection by training on a single attack. It introduces AED-PADA, a two-stage framework that first identifies Principal Adversarial Domains (PADs) via Adv-SCL, AD clustering with a Jensen-Shannon-based similarity, and CEFS-guided selection, then performs Principal Adversarial Domain Adaptation using MUDA with an Adversarial Feature Enhancement to detect unseen attacks. The method achieves superior cross-attack generalization across CIFAR-10, SVHN, and ImageNet, and across multiple backbones, while remaining robust to different perturbation magnitudes and compatible with several MUDA strategies. Practically, AED-PADA offers a scalable, real-time capable approach to adversarial detection that maintains performance without retraining for new attacks, advancing the deployment of robust defenses in real-world systems.

Abstract

Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen testing adversarial examples. To address this issue, we propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). Specifically, our approach identifies the Principal Adversarial Domains (PADs), i.e., a combination of features of the adversarial examples generated by different attacks, which possesses a large portion of the entire adversarial feature space. Subsequently, we pioneer to exploit Multi-source Unsupervised Domain Adaptation in adversarial example detection, with PADs as the source domains. Experimental results demonstrate the superior generalization ability of our proposed AED-PADA. Note that this superiority is particularly achieved in challenging scenarios characterized by employing the minimal magnitude constraint for the perturbations.

AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation

TL;DR

This work tackles the critical problem of poor generalization in adversarial example detection by training on a single attack. It introduces AED-PADA, a two-stage framework that first identifies Principal Adversarial Domains (PADs) via Adv-SCL, AD clustering with a Jensen-Shannon-based similarity, and CEFS-guided selection, then performs Principal Adversarial Domain Adaptation using MUDA with an Adversarial Feature Enhancement to detect unseen attacks. The method achieves superior cross-attack generalization across CIFAR-10, SVHN, and ImageNet, and across multiple backbones, while remaining robust to different perturbation magnitudes and compatible with several MUDA strategies. Practically, AED-PADA offers a scalable, real-time capable approach to adversarial detection that maintains performance without retraining for new attacks, advancing the deployment of robust defenses in real-world systems.

Abstract

Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen testing adversarial examples. To address this issue, we propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). Specifically, our approach identifies the Principal Adversarial Domains (PADs), i.e., a combination of features of the adversarial examples generated by different attacks, which possesses a large portion of the entire adversarial feature space. Subsequently, we pioneer to exploit Multi-source Unsupervised Domain Adaptation in adversarial example detection, with PADs as the source domains. Experimental results demonstrate the superior generalization ability of our proposed AED-PADA. Note that this superiority is particularly achieved in challenging scenarios characterized by employing the minimal magnitude constraint for the perturbations.
Paper Structure (30 sections, 11 equations, 3 figures, 15 tables)

This paper contains 30 sections, 11 equations, 3 figures, 15 tables.

Figures (3)

  • Figure 1: Schematic illustration of our proposed work. (a) represents the entire adversarial feature space as an instance, which contains 9 adversarial domains $\{A_1, A_2,\ldots,A_9\}$. (b) represents the mechanism of existing detection methods, which usually performs the training via a single source domain to detect the examples in the unseen adversarial domain, e.g. $T~(T=A_9)$. (c) is a straightforward solution to improve generalization ability via randomly selecting multiple source domains. (d) presents the intuition behind our work. We construct PADs, which possess a larger coverage of the entire feature space, to create more potential overlaps with the target domain. The strategy is designed to significantly enhance the detection generalization ability.
  • Figure 2: Our AED-PADA framework contains two stages: (a) Principal Adversarial Domains Identification, which consists of Adversarial Domain Acquisition, Adversarial Domain Clustering and Principal Adversarial Domains Selection, and (b) Principal Adversarial Domain Adaptation.
  • Figure 3: The three kernels of PEF employed to capture the perturbation signals hidden in the adversarial inputs.