ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

Sravanti Addepalli; Priyam Dey; R. Venkatesh Babu

ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

Sravanti Addepalli, Priyam Dey, R. Venkatesh Babu

TL;DR

ProFeAT addresses the gap between self-supervised and supervised adversarial training by introducing a teacher–student framework that uses a fixed projection head from the SSL teacher to distill representations into a student model. By placing the distillation loss in the projector space and enforcing robustness in the feature space, and by employing a mix of weak and strong augmentations for teacher and student respectively, ProFeAT achieves superior clean and robust accuracy, especially on larger models like WideResNet-34-10. The approach yields state-of-the-art results on CIFAR-10/100 benchmarks, demonstrates solid transfer performance, and maintains favorable compute compared to prior SSL-AT methods. Overall, ProFeAT provides scalable, robust representations that rival supervised adversarial training while reducing training complexity.

Abstract

The need for abundant labelled data in supervised Adversarial Training (AT) has prompted the use of Self-Supervised Learning (SSL) techniques with AT. However, the direct application of existing SSL methods to adversarial training has been sub-optimal due to the increased training complexity of combining SSL with AT. A recent approach, DeACL, mitigates this by utilizing supervision from a standard SSL teacher in a distillation setting, to mimic supervised AT. However, we find that there is still a large performance gap when compared to supervised adversarial training, specifically on larger models. In this work, investigate the key reason for this gap and propose Projected Feature Adversarial Training (ProFeAT) to bridge the same. We show that the sub-optimal distillation performance is a result of mismatch in training objectives of the teacher and student, and propose to use a projection head at the student, that allows it to leverage weak supervision from the teacher while also being able to learn adversarially robust representations that are distinct from the teacher. We further propose appropriate attack and defense losses at the feature and projector, alongside a combination of weak and strong augmentations for the teacher and student respectively, to improve the training data diversity without increasing the training complexity. Through extensive experiments on several benchmark datasets and models, we demonstrate significant improvements in both clean and robust accuracy when compared to existing SSL-AT methods, setting a new state-of-the-art. We further report on-par/ improved performance when compared to TRADES, a popular supervised-AT method.

ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

TL;DR

Abstract

Paper Structure (22 sections, 1 equation, 4 figures, 20 tables)

This paper contains 22 sections, 1 equation, 4 figures, 20 tables.

Introduction
Preliminaries
Related Works
Proposed Method
Projection Layer in Self-supervised Distillation
ProFeAT: Projected Feature Adversarial Training
Experiments and Results
Comparison with the state-of-the-art
Ablations
Conclusion
Background: Supervised Adversarial Defenses
Mechanism behind Scaling to Larger Datasets
Details on Datasets
Details on Training and Compute
Computational Complexity
...and 7 more sections

Figures (4)

Figure 1: Proposed approach (ProFeAT): The student is trained using a distillation loss on clean samples using supervision from an SSL pretrained teacher, and a smoothness loss to enforce adversarial robustness (details of exact loss formulation is presented in \ref{['sec:proposed_method']}). A frozen pretrained projection layer is used at the teacher and student to prevent overfitting to the clean distillation loss. The use of strong augmentations at the student increases attack diversity, while weak augmentations at the teacher reduce the training complexity.
Figure 2: Performance of ProFeAT when compared to DeACL deacl across variation in the robustness-accuracy trade-off parameter $\beta$ on CIFAR-100 dataset with WRN-34-10 architecture.
Figure 3: Performance ($\%$) of ProFeAT by varying the weight $\lambda$ between the defense losses at the feature and projector. The performance is stable across the range $\lambda \in [0.25,0.75]$. We thus fix the value of $\lambda$ to 0.5 in the proposed approach.
Figure 4: Robust accuracy of a supervised TRADES model across random restarts of PGD 5-step attack (CIFAR-100, WRN-34-10).

ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

TL;DR

Abstract

ProFeAT: Projected Feature Adversarial Training for Self-Supervised Learning of Robust Representations

Authors

TL;DR

Abstract

Table of Contents

Figures (4)