Table of Contents
Fetching ...

Self-supervised Learning: Generative or Contrastive

Xiao Liu, Fanjin Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, Jing Zhang, Jie Tang

TL;DR

Self-supervised learning reduces reliance on labeled data by exploiting intrinsic data structure to learn representations. The paper organizes SSL methods into three categories—generative, contrastive, and generative-contrastive—covering CV, NLP, and graph domains, and surveys empirical approaches alongside theoretical analyses. It highlights trade-offs between reconstruction-based generative methods and discriminative contrastive approaches, and discusses how semi-supervised self-training can complement SSL. The survey identifies open problems such as cross-domain transfer, task design automation, and deeper theoretical understanding to guide future research.

Abstract

Deep supervised learning has achieved great success in the last decade. However, its deficiencies of dependence on manual labels and vulnerability to attacks have driven people to explore a better solution. As an alternative, self-supervised learning attracts many researchers for its soaring performance on representation learning in the last several years. Self-supervised representation learning leverages input data itself as supervision and benefits almost all types of downstream tasks. In this survey, we take a look into new self-supervised learning methods for representation in computer vision, natural language processing, and graph learning. We comprehensively review the existing empirical methods and summarize them into three main categories according to their objectives: generative, contrastive, and generative-contrastive (adversarial). We further investigate related theoretical analysis work to provide deeper thoughts on how self-supervised learning works. Finally, we briefly discuss open problems and future directions for self-supervised learning. An outline slide for the survey is provided.

Self-supervised Learning: Generative or Contrastive

TL;DR

Self-supervised learning reduces reliance on labeled data by exploiting intrinsic data structure to learn representations. The paper organizes SSL methods into three categories—generative, contrastive, and generative-contrastive—covering CV, NLP, and graph domains, and surveys empirical approaches alongside theoretical analyses. It highlights trade-offs between reconstruction-based generative methods and discriminative contrastive approaches, and discusses how semi-supervised self-training can complement SSL. The survey identifies open problems such as cross-domain transfer, task design automation, and deeper theoretical understanding to guide future research.

Abstract

Deep supervised learning has achieved great success in the last decade. However, its deficiencies of dependence on manual labels and vulnerability to attacks have driven people to explore a better solution. As an alternative, self-supervised learning attracts many researchers for its soaring performance on representation learning in the last several years. Self-supervised representation learning leverages input data itself as supervision and benefits almost all types of downstream tasks. In this survey, we take a look into new self-supervised learning methods for representation in computer vision, natural language processing, and graph learning. We comprehensively review the existing empirical methods and summarize them into three main categories according to their objectives: generative, contrastive, and generative-contrastive (adversarial). We further investigate related theoretical analysis work to provide deeper thoughts on how self-supervised learning works. Finally, we briefly discuss open problems and future directions for self-supervised learning. An outline slide for the survey is provided.

Paper Structure

This paper contains 43 sections, 37 equations, 19 figures, 1 table.

Figures (19)

  • Figure 1: An illustration to distinguish the supervised, unsupervised and self-supervised learning framework. In self-supervised learning, the "related information" could be another modality, parts of inputs, or another form of the inputs. Repainted from de1994learning.
  • Figure 2: Number of publications and citations on self-supervised learning during 2012-2020, from Microsoft Academic sinha2015overviewzhang2019oag. Self-supervised learning is drawing tremendous attention in recent years.
  • Figure 3: Categorization of Self-supervised learning (SSL): Generative, Contrastive and Generative-Contrastive (Adversarial).
  • Figure 4: Conceptual comparison between Generative, Contrastive, and Generative-Contrastive methods.
  • Figure 5: Architecture of VQ-VAE van2017neural. Compared to VAE, the orginal hidden distribution is replaced with a quantized vector dictionary. In addition, the prior distribution is replaced with a pre-trained PixelCNN that models the hierarchical features of images. Taken from van2017neural
  • ...and 14 more figures