Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense
Jeremy Styborski, Mingzhi Lyu, Yi Huang, Adams Kong
TL;DR
Availability poisons corrupt SL by injecting imperceptible, class-related shortcuts, causing poor generalization on clean data; SSL can resist some poisons but may still encode poison cues. VESPR, a multi-task framework, leverages SL’s vulnerability to generate adversarial augmentations within SSL training, optimizing $L_{VESPR}= \alpha L_{CL}+\beta L_{CE}$ with $\delta^* = \arg\max_{\delta} L_{CE}(c(f(x+\delta)),y)$ under $\|\delta\|_{\infty} \le \epsilon_{AT}$ and $T$ steps, to emphasize robust features. Evaluated on CIFAR-10 and ImageNet-100 against seven poisons, VESPR outperforms six defenses, achieving the highest minimum and average poison accuracies (e.g., on ImageNet-100, Psn Min/Avg improved by up to $+16\%$ and $+9\%$) while preserving strong clean accuracy. Ablation studies reveal the critical role of SL-guided adversarial augmentations, with VESPR yielding more robust, poison-aligned representations (lower $\text{Psn Local Lip}$, higher $\text{Psn-Cln Sim}$) and resilience across SSL variants.
Abstract
Availability poisons exploit supervised learning (SL) algorithms by introducing class-related shortcut features in images such that models trained on poisoned data are useless for real-world datasets. Self-supervised learning (SSL), which utilizes augmentations to learn instance discrimination, is regarded as a strong defense against poisoned data. However, by extending the study of SSL across multiple poisons on the CIFAR-10 and ImageNet-100 datasets, we demonstrate that it often performs poorly, far below that of training on clean data. Leveraging the vulnerability of SL to poison attacks, we introduce adversarial training (AT) on SL to obfuscate poison features and guide robust feature learning for SSL. Our proposed defense, designated VESPR (Vulnerability Exploitation of Supervised Poisoning for Robust SSL), surpasses the performance of six previous defenses across seven popular availability poisons. VESPR displays superior performance over all previous defenses, boosting the minimum and average ImageNet-100 test accuracies of poisoned models by 16% and 9%, respectively. Through analysis and ablation studies, we elucidate the mechanisms by which VESPR learns robust class features.
