Table of Contents
Fetching ...

SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer

Renan A. Rojas-Gomez, Karan Singhal, Ali Etemad, Alex Bijamov, Warren R. Morningstar, Philip Andrew Mansfield

TL;DR

SASSL boosts top-1 image classification accuracy on ImageNet by up to 2 percentage points compared to established self-supervised methods like MoCo, SimCLR, and BYOL, while achieving superior transfer learning performance across various datasets.

Abstract

Existing data augmentation in self-supervised learning, while diverse, fails to preserve the inherent structure of natural images. This results in distorted augmented samples with compromised semantic information, ultimately impacting downstream performance. To overcome this limitation, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel data augmentation technique based on Neural Style Transfer. SASSL decouples semantic and stylistic attributes in images and applies transformations exclusively to their style while preserving content, generating diverse samples that better retain semantic information. SASSL boosts top-1 image classification accuracy on ImageNet by up to 2 percentage points compared to established self-supervised methods like MoCo, SimCLR, and BYOL, while achieving superior transfer learning performance across various datasets. Because SASSL can be performed asynchronously as part of the data augmentation pipeline, these performance impacts can be obtained with no change in pretraining throughput.

SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer

TL;DR

SASSL boosts top-1 image classification accuracy on ImageNet by up to 2 percentage points compared to established self-supervised methods like MoCo, SimCLR, and BYOL, while achieving superior transfer learning performance across various datasets.

Abstract

Existing data augmentation in self-supervised learning, while diverse, fails to preserve the inherent structure of natural images. This results in distorted augmented samples with compromised semantic information, ultimately impacting downstream performance. To overcome this limitation, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel data augmentation technique based on Neural Style Transfer. SASSL decouples semantic and stylistic attributes in images and applies transformations exclusively to their style while preserving content, generating diverse samples that better retain semantic information. SASSL boosts top-1 image classification accuracy on ImageNet by up to 2 percentage points compared to established self-supervised methods like MoCo, SimCLR, and BYOL, while achieving superior transfer learning performance across various datasets. Because SASSL can be performed asynchronously as part of the data augmentation pipeline, these performance impacts can be obtained with no change in pretraining throughput.
Paper Structure (23 sections, 9 equations, 5 figures, 13 tables, 1 algorithm)

This paper contains 23 sections, 9 equations, 5 figures, 13 tables, 1 algorithm.

Figures (5)

  • Figure 1: Towards diverse SSL data augmentation via Neural Style Transfer. We propose SASSL, a novel augmentation technique that leverages Style Transfer to create pretraining views that are semantically aware, focusing solely on modifying the image's appearance while preserving its content. SASSL combines the image's content with the texture of an external reference style, generating augmented views that better retain the image's semantic meaning. By incorporating Style Transfer into traditional SSL augmentation pipelines and controlling the stylization strength through gradual blending of style features and pixel values, SASSL promotes stronger representations compared to well-established SSL methods.
  • Figure 1: SASSL $+$ MoCo v2 downstream classification performance on ImageNet. Linear probing accuracy ($\%$) of a ResNet-50 backbone pretrained using SASSL $+$ MoCo v2. Mean accuracy reported over five random trials.
  • Figure 2: Feature blending and image interpolation. A fine-grained control over the final stylized image is obtained via interpolation and blending factors $\alpha$ and $\beta$ that operate in the feature and pixel domains. This prevents augmented views from losing semantic information due to strong transformations.
  • Figure 3: SASSL + MoCo v2 Few-shot learning performance. One and ten-shot top-1 classification accuracy ($\%$) of representations learned via SASSL $+$ MoCo v2. Accuracy reported on a single trial.
  • Figure 4: t-SNE visualization of style representations. Two-dimensional embeddings of the style representations of different datasets, extracted by the Fast Style Transfer method. Style embeddings of the Diabetic Retinopathy dataset (marked in yellow) form clusters that do not overlap with the rest of datasets, while embeddings from the remaining datasets are close to each other.