Towards Adversarial Robustness And Backdoor Mitigation in SSL

Aryan Satpathy; Nilaksh Singh; Dhruva Rajwade; Somesh Kumar

Towards Adversarial Robustness And Backdoor Mitigation in SSL

Aryan Satpathy, Nilaksh Singh, Dhruva Rajwade, Somesh Kumar

TL;DR

Self-supervised representations are vulnerable to backdoor attacks when a small fraction of training data is poisoned. The authors propose two frequency-domain augmentations, Gaussian Blur and Frequency Patch, to disrupt frequency-space triggers and improve robustness, with an extension to supervised learning through a joint augmentation loss. Empirical results on CIFAR-10/100 show reduced backdoor attack success rates while preserving clean accuracy, and RobustBench assessments suggest enhanced adversarial robustness under $\ell_\infty$ perturbations (\epsilon = $\frac{2}{255}$). The work provides a scalable defense framework for SSL backdoors and lays groundwork for extending robustness strategies to other learning paradigms.

Abstract

Self-Supervised Learning (SSL) has shown great promise in learning representations from unlabeled data. The power of learning representations without the need for human annotations has made SSL a widely used technique in real-world problems. However, SSL methods have recently been shown to be vulnerable to backdoor attacks, where the learned model can be exploited by adversaries to manipulate the learned representations, either through tampering the training data distribution, or via modifying the model itself. This work aims to address defending against backdoor attacks in SSL, where the adversary has access to a realistic fraction of the SSL training data, and no access to the model. We use novel methods that are computationally efficient as well as generalizable across different problem settings. We also investigate the adversarial robustness of SSL models when trained with our method, and show insights into increased robustness in SSL via frequency domain augmentations. We demonstrate the effectiveness of our method on a variety of SSL benchmarks, and show that our method is able to mitigate backdoor attacks while maintaining high performance on downstream tasks. Code for our work is available at github.com/Aryan-Satpathy/Backdoor

Towards Adversarial Robustness And Backdoor Mitigation in SSL

TL;DR

perturbations (\epsilon =

). The work provides a scalable defense framework for SSL backdoors and lays groundwork for extending robustness strategies to other learning paradigms.

Abstract

Paper Structure (26 sections, 10 equations, 2 figures, 7 tables)

This paper contains 26 sections, 10 equations, 2 figures, 7 tables.

Introduction
Preliminaries
Self-supervised and contrastive learning
Equivariance and invariance
Adversarial Perturbations
Method
Gaussian Blur
Frequency Patching (Freq Patch)
Extension to Supervised Learning
Experiments
Datasets
Classification metrics
RobustBench
Results
Conclusion
...and 11 more sections

Figures (2)

Figure 1: t-SNE Clustering of CIFAR10 embeddings obtained using SimCLR for a poisoned vanilla model (left) and a poisoned vanilla model equipped with the blur defense (right). While a poisoned sample x is clustered with the target class (orange) irrespective of its actual class for the poisoned model (left), the blur defense mitigates this and x is clustered with with its actual class (right).
Figure 2: Example of FIBA poisoning on two different images. The residuals, i.e. the difference between original and poisoned image vary between images. Scaled version of residuals have been plotted for visualization purposes.

Towards Adversarial Robustness And Backdoor Mitigation in SSL

TL;DR

Abstract

Towards Adversarial Robustness And Backdoor Mitigation in SSL

Authors

TL;DR

Abstract

Table of Contents

Figures (2)