Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense

Jeremy Styborski; Mingzhi Lyu; Yi Huang; Adams Kong

Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense

Jeremy Styborski, Mingzhi Lyu, Yi Huang, Adams Kong

TL;DR

Availability poisons corrupt SL by injecting imperceptible, class-related shortcuts, causing poor generalization on clean data; SSL can resist some poisons but may still encode poison cues. VESPR, a multi-task framework, leverages SL’s vulnerability to generate adversarial augmentations within SSL training, optimizing $L_{VESPR}= \alpha L_{CL}+\beta L_{CE}$ with $\delta^* = \arg\max_{\delta} L_{CE}(c(f(x+\delta)),y)$ under $\|\delta\|_{\infty} \le \epsilon_{AT}$ and $T$ steps, to emphasize robust features. Evaluated on CIFAR-10 and ImageNet-100 against seven poisons, VESPR outperforms six defenses, achieving the highest minimum and average poison accuracies (e.g., on ImageNet-100, Psn Min/Avg improved by up to $+16\%$ and $+9\%$) while preserving strong clean accuracy. Ablation studies reveal the critical role of SL-guided adversarial augmentations, with VESPR yielding more robust, poison-aligned representations (lower $\text{Psn Local Lip}$, higher $\text{Psn-Cln Sim}$) and resilience across SSL variants.

Abstract

Availability poisons exploit supervised learning (SL) algorithms by introducing class-related shortcut features in images such that models trained on poisoned data are useless for real-world datasets. Self-supervised learning (SSL), which utilizes augmentations to learn instance discrimination, is regarded as a strong defense against poisoned data. However, by extending the study of SSL across multiple poisons on the CIFAR-10 and ImageNet-100 datasets, we demonstrate that it often performs poorly, far below that of training on clean data. Leveraging the vulnerability of SL to poison attacks, we introduce adversarial training (AT) on SL to obfuscate poison features and guide robust feature learning for SSL. Our proposed defense, designated VESPR (Vulnerability Exploitation of Supervised Poisoning for Robust SSL), surpasses the performance of six previous defenses across seven popular availability poisons. VESPR displays superior performance over all previous defenses, boosting the minimum and average ImageNet-100 test accuracies of poisoned models by 16% and 9%, respectively. Through analysis and ablation studies, we elucidate the mechanisms by which VESPR learns robust class features.

Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense

TL;DR

with

under

and

steps, to emphasize robust features. Evaluated on CIFAR-10 and ImageNet-100 against seven poisons, VESPR outperforms six defenses, achieving the highest minimum and average poison accuracies (e.g., on ImageNet-100, Psn Min/Avg improved by up to

and

) while preserving strong clean accuracy. Ablation studies reveal the critical role of SL-guided adversarial augmentations, with VESPR yielding more robust, poison-aligned representations (lower

, higher

) and resilience across SSL variants.

Abstract

Paper Structure (33 sections, 3 equations, 5 figures, 14 tables)

This paper contains 33 sections, 3 equations, 5 figures, 14 tables.

Introduction
Related Works
Data Poisoning
Poison Defenses
Self-Supervised Learning
Multi-Task Learning
Method
Formulation of Availability Poison
Analysis of SL and SSL against Availability Poisons
VESPR
Experiments
Experiment Setup
VESPR Performance
Ablation Studies
Architecture Ablation
...and 18 more sections

Figures (5)

Figure 1: (a) Mean of cosine similarity between representations of clean images and poison images (Psn-Cln Sim, $cossim[f(x_{p,i}),f(x_{c,i})]$). (b) Poison (Psn) and clean (Cln) classification accuracies tested with $x_{p,i}$ and $x_{c,i}$, respectively. We display curves for SL and SSL models, both trained on adversarial poison Fowl_2021_TAP data.
Figure 2: Overview of VESPR architecture. For each input image, VESPR generates two views through augmentation and encodes them with encoder $f(\cdot)$ as representations $r_1$ and $r_2$, respectively. Cross-entropy (CE) loss is calculated from the output of the classifier head, $c(r_1)$. The CE loss gradient is used to generate adversarial perturbations to the first view using projected gradient descent Kurakin_2017_PGDMadry_2019_PGD. The adversarial images are encoded to replace the original representations. Contrastive loss is calculated from projected representations, $g(r_1)$ and $g(r_2)$, using projections of other images as negative samples. The VESPR network is trained from the combined CE and contrastive losses.
Figure 3: 2-dimensional t-SNE plots Hinton_2002_TSNE of clean and CUDA-poisoned image representations for SL, SSL, and VESPR models trained on CUDA data.
Figure 4: Ablation of the $\alpha$ weight from Eq. \ref{['eqn:M_VESPR']}. $\beta$ is held constant at $0.5$. The dataset is ImageNet-100.
Figure 5: 2-dimensional t-SNE plots Hinton_2002_TSNE of clean and poison image representations for SL, SL+AT, SSL, SSL+SL, and VESPR models trained on AP8 and CUDA data. Color schemes denote different image classes.

Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense

TL;DR

Abstract

Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense

Authors

TL;DR

Abstract

Table of Contents

Figures (5)