Invisible Backdoor Attack against Self-supervised Learning
Hanrong Zhang, Zhenting Wang, Boheng Li, Fulin Lin, Tingxu Han, Mingyu Jin, Chenlu Zhan, Mengnan Du, Hongwei Wang, Shiqing Ma
TL;DR
Self-supervised learning models are vulnerable to backdoor attacks, and existing invisible triggers designed for supervised learning underperform in SSL due to entanglement between backdoor signals and augmentation. The authors propose INACTIVE, an imperceptible backdoor that disentangles the backdoor trigger from SSL augmentations by operating in HSV/HSL color spaces and learning alignment and stealth constraints. They demonstrate near-perfect attack success rates across five datasets and six SSL algorithms, with strong stealth metrics and robustness to defenses. The work highlights a substantial security risk in SSL pipelines and suggests a need for defenses that account for distributional disentanglement and perceptual stealth. Code is released for reproducibility.
Abstract
Self-supervised learning (SSL) models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in SSL often involve noticeable triggers, like colored patches or visible noise, which are vulnerable to human inspection. This paper proposes an imperceptible and effective backdoor attack against self-supervised models. We first find that existing imperceptible triggers designed for supervised learning are less effective in compromising self-supervised models. We then identify this ineffectiveness is attributed to the overlap in distributions between the backdoor and augmented samples used in SSL. Building on this insight, we design an attack using optimized triggers disentangled with the augmented transformation in the SSL, while remaining imperceptible to human vision. Experiments on five datasets and six SSL algorithms demonstrate our attack is highly effective and stealthy. It also has strong resistance to existing backdoor defenses. Our code can be found at https://github.com/Zhang-Henry/INACTIVE.
