Self-Supervised Representation Learning for Adversarial Attack Detection

Yi Li; Plamen Angelov; Neeraj Suri

Self-Supervised Representation Learning for Adversarial Attack Detection

Yi Li, Plamen Angelov, Neeraj Suri

TL;DR

This paper tackles adversarial attack detection with a self-supervised framework that requires no labeled adversarial examples. It introduces pixel mapping with a loss $\mathcal{L}_{\text{PM}}$, prototype-wise contrastive estimation $\mathcal{L}_{\text{PCE}}$, and an instance-discrimination memory (discrimination bank) via $\mathcal{L}_{\text{ICL}}$, all learned by a parallel axial-attention encoder (PAA-ResNet). Across ImageNet, CIFAR-10, and COCO, the approach achieves state-of-the-art detection performance on unseen attacks while maintaining efficiency due to parallelized attention and training-time discriminative memory. The results demonstrate robust, transferable representations that mitigate labeling needs and adapt to novel datasets and attack algorithms, making the method practically impactful for secure AI systems.

Abstract

Supervised learning-based adversarial attack detection methods rely on a large number of labeled data and suffer significant performance degradation when applying the trained model to new domains. In this paper, we propose a self-supervised representation learning framework for the adversarial attack detection task to address this drawback. Firstly, we map the pixels of augmented input images into an embedding space. Then, we employ the prototype-wise contrastive estimation loss to cluster prototypes as latent variables. Additionally, drawing inspiration from the concept of memory banks, we introduce a discrimination bank to distinguish and learn representations for each individual instance that shares the same or a similar prototype, establishing a connection between instances and their associated prototypes. We propose a parallel axial-attention (PAA)-based encoder to facilitate the training process by parallel training over height- and width-axis of attention maps. Experimental results show that, compared to various benchmark self-supervised vision learning models and supervised adversarial attack detection methods, the proposed model achieves state-of-the-art performance on the adversarial attack detection task across a wide range of images.

Self-Supervised Representation Learning for Adversarial Attack Detection

TL;DR

This paper tackles adversarial attack detection with a self-supervised framework that requires no labeled adversarial examples. It introduces pixel mapping with a loss

, prototype-wise contrastive estimation

, and an instance-discrimination memory (discrimination bank) via

, all learned by a parallel axial-attention encoder (PAA-ResNet). Across ImageNet, CIFAR-10, and COCO, the approach achieves state-of-the-art detection performance on unseen attacks while maintaining efficiency due to parallelized attention and training-time discriminative memory. The results demonstrate robust, transferable representations that mitigate labeling needs and adapt to novel datasets and attack algorithms, making the method practically impactful for secure AI systems.

Abstract

Paper Structure (25 sections, 8 equations, 5 figures, 6 tables)

This paper contains 25 sections, 8 equations, 5 figures, 6 tables.

Introduction
Related Works
Self-Supervised Learning
Contrastive Learning
Axial-attention
Self-Supervised Representation Learning
Preliminaries
Pixel Mapping
Prototype-wise Contrastive Estimation
Instance-Wise Contrastive Learning
Parallel Axial-attention-Based Encoder
Experiments
Datasets and Attacks
Backbones and Competitors
Implementation Details
...and 10 more sections

Figures (5)

Figure 1: Self-supervised representation learning framework
Figure 2: Parallel axial-attention (PAA) block.
Figure 3: Ablation study for (a) number of parallel axial-attention (PAA) blocks and (b) attention blocks to different backbones (c) hyperparameter $\tau$. The red and purple rectangles represent accuracy improvements when axial attention and PAA are added to the backbones, respectively.
Figure 4: Ablation study for (a) hyper-parameter $\beta$ and (b) $\lambda_1 \&\lambda_2$.
Figure 5: t-SNE feature visualization of the model with (a) $\mathcal{L}_{\text{PM}}$; (b) $\mathcal{L}_{\text{PM}}+\mathcal{L}_{\text{PCE}}$; (c) $\mathcal{L}_{\text{PM}}+\mathcal{L}_{\text{PCE}}+\mathcal{L}_{\text{ICL}}$.

Self-Supervised Representation Learning for Adversarial Attack Detection

TL;DR

Abstract

Self-Supervised Representation Learning for Adversarial Attack Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)