Table of Contents
Fetching ...

Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?

Shruthi Gowda, Elahe Arani, Bahram Zonooz

TL;DR

This paper addresses the heavy dependence of self-supervised learning (SSL) on strong data augmentations and the associated biases that hamper transfer and robustness. It proposes SSL-Prior, a framework that injects prior knowledge about global shape through a separate prior network trained on a Sobel-filtered shape view, supervised by a KL-divergence consistency loss to the SSL module. Empirical results show that SSL-Prior reduces texture bias and shortcut learning, improves robustness to natural and adversarial perturbations, and enhances out-of-distribution generalization, while maintaining strong IID performance with basic augmentations. The approach also yields notable gains in downstream dense prediction tasks, such as object detection, suggesting practical scalability and real-world applicability for SSL in data-scarce and safety-critical domains.

Abstract

Self-supervised learning (SSL) has emerged as a promising solution for addressing the challenge of limited labeled data in deep neural networks (DNNs), offering scalability potential. However, the impact of design dependencies within the SSL framework remains insufficiently investigated. In this study, we comprehensively explore SSL behavior across a spectrum of augmentations, revealing their crucial role in shaping SSL model performance and learning mechanisms. Leveraging these insights, we propose a novel learning approach that integrates prior knowledge, with the aim of curtailing the need for extensive data augmentations and thereby amplifying the efficacy of learned representations. Notably, our findings underscore that SSL models imbued with prior knowledge exhibit reduced texture bias, diminished reliance on shortcuts and augmentations, and improved robustness against both natural and adversarial corruptions. These findings not only illuminate a new direction in SSL research, but also pave the way for enhancing DNN performance while concurrently alleviating the imperative for intensive data augmentation, thereby enhancing scalability and real-world problem-solving capabilities.

Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?

TL;DR

This paper addresses the heavy dependence of self-supervised learning (SSL) on strong data augmentations and the associated biases that hamper transfer and robustness. It proposes SSL-Prior, a framework that injects prior knowledge about global shape through a separate prior network trained on a Sobel-filtered shape view, supervised by a KL-divergence consistency loss to the SSL module. Empirical results show that SSL-Prior reduces texture bias and shortcut learning, improves robustness to natural and adversarial perturbations, and enhances out-of-distribution generalization, while maintaining strong IID performance with basic augmentations. The approach also yields notable gains in downstream dense prediction tasks, such as object detection, suggesting practical scalability and real-world applicability for SSL in data-scarce and safety-critical domains.

Abstract

Self-supervised learning (SSL) has emerged as a promising solution for addressing the challenge of limited labeled data in deep neural networks (DNNs), offering scalability potential. However, the impact of design dependencies within the SSL framework remains insufficiently investigated. In this study, we comprehensively explore SSL behavior across a spectrum of augmentations, revealing their crucial role in shaping SSL model performance and learning mechanisms. Leveraging these insights, we propose a novel learning approach that integrates prior knowledge, with the aim of curtailing the need for extensive data augmentations and thereby amplifying the efficacy of learned representations. Notably, our findings underscore that SSL models imbued with prior knowledge exhibit reduced texture bias, diminished reliance on shortcuts and augmentations, and improved robustness against both natural and adversarial corruptions. These findings not only illuminate a new direction in SSL research, but also pave the way for enhancing DNN performance while concurrently alleviating the imperative for intensive data augmentation, thereby enhancing scalability and real-world problem-solving capabilities.
Paper Structure (26 sections, 6 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 6 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: The impact of augmentations on SSL methods is critical: as removing strong augmentations from SSL training can result in a significant drop in their performance.
  • Figure 2: Examples of aggressive augmentations producing noisy images in CIFAR (first three columns) and ImageNet (last three columns) datasets, respectively. High intensity in augmentations such as color jitter, blurring, and solarization results in semantic shifts.
  • Figure 3: Schematic of SSL method with Prior knowledge integration. The SSL module can incorporate any SSL method, such as Contrastive, Asymmetric, and Feature Decorrelation-based, and one method is selected from each category for this study. The prior network extracts implicit semantic knowledge and supervises the SSL module network to learn better representations. The resulting network from the SSL module is then used for inference purposes. This approach is expected to improve the quality of learned features and enhance the generalization capability of the resulting network.
  • Figure 4: Shortcut learning: Evaluation using Tinted-STL10 and Skewed-CelebA datasets on all three SSL methods. The results indicate that SSL trained with priors are less vulnerable to learning unintended cues and spurious correlations in the data.
  • Figure 5: Robustness analysis: PGD attack on models fine-tined on CIFAR10 dataset. The SSL methods with Prior are more robust compared to the baseline across varying attack strengths.
  • ...and 3 more figures