Table of Contents
Fetching ...

Improving out-of-distribution generalization via multi-task self-supervised pretraining

Isabela Albuquerque, Nikhil Naik, Junnan Li, Nitish Keskar, Richard Socher

TL;DR

The paper investigates domain generalization in computer vision and demonstrates that multi-task self-supervised pretraining can match or surpass supervised pretraining for unseen domains. It introduces a novel Gabor filter response reconstruction task alongside Rotation and DeepCluster within a shared encoder framework, followed by supervised fine-tuning on source domains. Through PACS and VLCS benchmarks, multi-task SSL shows strong transfer to unseen domains, especially under large domain shifts, and yields better object localization than supervised baselines. The work also shows that SSL features can synergize with methods like IRM, highlighting SSL as a robust foundation for domain generalization and cross-domain transfer with limited labeled data.

Abstract

Self-supervised feature representations have been shown to be useful for supervised classification, few-shot learning, and adversarial robustness. We show that features obtained using self-supervised learning are comparable to, or better than, supervised learning for domain generalization in computer vision. We introduce a new self-supervised pretext task of predicting responses to Gabor filter banks and demonstrate that multi-task learning of compatible pretext tasks improves domain generalization performance as compared to training individual tasks alone. Features learnt through self-supervision obtain better generalization to unseen domains when compared to their supervised counterpart when there is a larger domain shift between training and test distributions and even show better localization ability for objects of interest. Self-supervised feature representations can also be combined with other domain generalization methods to further boost performance.

Improving out-of-distribution generalization via multi-task self-supervised pretraining

TL;DR

The paper investigates domain generalization in computer vision and demonstrates that multi-task self-supervised pretraining can match or surpass supervised pretraining for unseen domains. It introduces a novel Gabor filter response reconstruction task alongside Rotation and DeepCluster within a shared encoder framework, followed by supervised fine-tuning on source domains. Through PACS and VLCS benchmarks, multi-task SSL shows strong transfer to unseen domains, especially under large domain shifts, and yields better object localization than supervised baselines. The work also shows that SSL features can synergize with methods like IRM, highlighting SSL as a robust foundation for domain generalization and cross-domain transfer with limited labeled data.

Abstract

Self-supervised feature representations have been shown to be useful for supervised classification, few-shot learning, and adversarial robustness. We show that features obtained using self-supervised learning are comparable to, or better than, supervised learning for domain generalization in computer vision. We introduce a new self-supervised pretext task of predicting responses to Gabor filter banks and demonstrate that multi-task learning of compatible pretext tasks improves domain generalization performance as compared to training individual tasks alone. Features learnt through self-supervision obtain better generalization to unseen domains when compared to their supervised counterpart when there is a larger domain shift between training and test distributions and even show better localization ability for objects of interest. Self-supervised feature representations can also be combined with other domain generalization methods to further boost performance.

Paper Structure

This paper contains 26 sections, 1 equation, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Illustration of the training scheme. Left: Self-supervised pretraining with multiple tasks. The feature extractor is shared and is updated through the loss of all tasks. Right: Supervised finetuning for the domain generalization.
  • Figure 2: Gabor filter response reconstruction task. Left: Prediction by a model trained with the Gabor filter response reconstruction task alone. Right: Prediction by a model simultaneously trained with DeepCluster, Rotation, and the Gabor filter response reconstruction task.
  • Figure 2: Domain generalization performance on the PACS benchmark. Multi-task self-supervised learning outperforms supervised learning on PACS. Accuracy reported in percent. Bolded value indicates best model for the target domain.
  • Figure 3: Examples: ImageNet and PACS.
  • Figure 3: Domain generalization performance on the VLCS benchmark. Multi-task self-supervised learning performs comparably to supervised learning on VLCS. Accuracy reported in percent. Bolded value indicates best model for the target domain.
  • ...and 6 more figures