Table of Contents
Fetching ...

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, Dacheng Tao

TL;DR

This survey addresses the rapid growth of self-supervised learning (SSL) by organizing algorithms, applications, and future directions. It classifies SSL into context-based, contrastive, generative, and contrastive-generative methods, and discusses their combinations with GANs, semi-supervised learning, MIL, and multi-view modalities. Empirical findings show that contrastive learning offers strong linear-probe performance, while masked image modeling (MIM) typically yields superior fine-tuning results, with significant implications for resource use and scalability. The paper highlights three trends—theoretical unification of SSL methods, multimodal and transformer-based unified SSL models, and automated design of effective pretext tasks—along with key open questions about data efficiency, modalities, and practical method selection.

Abstract

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance. However, the process of collecting and labeling such data can be expensive and time-consuming. Self-supervised learning (SSL), a subset of unsupervised learning, aims to learn discriminative features from unlabeled data without relying on human-annotated labels. SSL has garnered significant attention recently, leading to the development of numerous related algorithms. However, there is a dearth of comprehensive studies that elucidate the connections and evolution of different SSL variants. This paper presents a review of diverse SSL methods, encompassing algorithmic aspects, application domains, three key trends, and open research questions. Firstly, we provide a detailed introduction to the motivations behind most SSL algorithms and compare their commonalities and differences. Secondly, we explore representative applications of SSL in domains such as image processing, computer vision, and natural language processing. Lastly, we discuss the three primary trends observed in SSL research and highlight the open questions that remain. A curated collection of valuable resources can be accessed at https://github.com/guijiejie/SSL.

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

TL;DR

This survey addresses the rapid growth of self-supervised learning (SSL) by organizing algorithms, applications, and future directions. It classifies SSL into context-based, contrastive, generative, and contrastive-generative methods, and discusses their combinations with GANs, semi-supervised learning, MIL, and multi-view modalities. Empirical findings show that contrastive learning offers strong linear-probe performance, while masked image modeling (MIM) typically yields superior fine-tuning results, with significant implications for resource use and scalability. The paper highlights three trends—theoretical unification of SSL methods, multimodal and transformer-based unified SSL models, and automated design of effective pretext tasks—along with key open questions about data efficiency, modalities, and practical method selection.

Abstract

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance. However, the process of collecting and labeling such data can be expensive and time-consuming. Self-supervised learning (SSL), a subset of unsupervised learning, aims to learn discriminative features from unlabeled data without relying on human-annotated labels. SSL has garnered significant attention recently, leading to the development of numerous related algorithms. However, there is a dearth of comprehensive studies that elucidate the connections and evolution of different SSL variants. This paper presents a review of diverse SSL methods, encompassing algorithmic aspects, application domains, three key trends, and open research questions. Firstly, we provide a detailed introduction to the motivations behind most SSL algorithms and compare their commonalities and differences. Secondly, we explore representative applications of SSL in domains such as image processing, computer vision, and natural language processing. Lastly, we discuss the three primary trends observed in SSL research and highlight the open questions that remain. A curated collection of valuable resources can be accessed at https://github.com/guijiejie/SSL.
Paper Structure (40 sections, 29 equations, 8 figures, 6 tables)

This paper contains 40 sections, 29 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: The general pipeline of applying SSL methods to downstream tasks. The SSL models are first pre-trained on the unlabeled data and then fine-tuned, or directly evaluated, on the labeled data of the downstream tasks.
  • Figure 2: Google Scholar search results for "self-supervised learning". The vertical and horizontal axes denote the number of SSL publications and the year, respectively.
  • Figure 3: The differences among supervised learning, unsupervised learning, and SSL. The image is reproduced from de1994learning. SSL utilizes freely derived labels as supervision instead of manually annotated labels.
  • Figure 4: Illustration of three common context-based methods: rotation, jigsaw, and colorization.
  • Figure 5: Illustration of different CL methods: CL based on negative examples (left), CL based on self-distillation (middle), and CL based on feature decorrelation (right). For a demonstration of the concepts of similarity and dissimilarity, one can refer to wang2020understandingchen2020simple, while for insights into decorrelation, zbontar2021barlowbardes2021vicreg provide a comprehensive overview.
  • ...and 3 more figures