Self-supervised Learning: Generative or Contrastive
Xiao Liu, Fanjin Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, Jing Zhang, Jie Tang
TL;DR
Self-supervised learning reduces reliance on labeled data by exploiting intrinsic data structure to learn representations. The paper organizes SSL methods into three categories—generative, contrastive, and generative-contrastive—covering CV, NLP, and graph domains, and surveys empirical approaches alongside theoretical analyses. It highlights trade-offs between reconstruction-based generative methods and discriminative contrastive approaches, and discusses how semi-supervised self-training can complement SSL. The survey identifies open problems such as cross-domain transfer, task design automation, and deeper theoretical understanding to guide future research.
Abstract
Deep supervised learning has achieved great success in the last decade. However, its deficiencies of dependence on manual labels and vulnerability to attacks have driven people to explore a better solution. As an alternative, self-supervised learning attracts many researchers for its soaring performance on representation learning in the last several years. Self-supervised representation learning leverages input data itself as supervision and benefits almost all types of downstream tasks. In this survey, we take a look into new self-supervised learning methods for representation in computer vision, natural language processing, and graph learning. We comprehensively review the existing empirical methods and summarize them into three main categories according to their objectives: generative, contrastive, and generative-contrastive (adversarial). We further investigate related theoretical analysis work to provide deeper thoughts on how self-supervised learning works. Finally, we briefly discuss open problems and future directions for self-supervised learning. An outline slide for the survey is provided.
