A Review on Discriminative Self-supervised Learning Methods in Computer Vision
Nikolaos Giakoumoglou, Tania Stathaki, Athanasios Gkelias
TL;DR
The paper surveys discriminative self-supervised learning for computer vision, organizing methods into contrastive, clustering, self-distillation, knowledge distillation, and feature decorrelation families. It systematically analyzes architectural choices, pretext tasks, and loss functions, and evaluates methods via linear and semi-supervised benchmarks on ImageNet-1K, plus transfer to a wide range of classification and vision tasks. Key findings highlight strong linear and transfer performance from methods like ReLIC-v2, TWIST, and BYOL-based frameworks, while also underscoring challenges in scalability, robustness, and domain shift. The work emphasizes the need for efficient, broadly applicable SSL techniques, improved benchmarking, and theoretically grounded objectives to guide future research and practical deployment.
Abstract
Self-supervised learning (SSL) has rapidly emerged as a transformative approach in computer vision, enabling the extraction of rich feature representations from vast amounts of unlabeled data and reducing reliance on costly manual annotations. This review presents a comprehensive analysis of discriminative SSL methods, which focus on learning representations by solving pretext tasks that do not require human labels. The paper systematically categorizes discriminative SSL approaches into five main groups: contrastive methods, clustering methods, self-distillation methods, knowledge distillation methods, and feature decorrelation methods. For each category, the review details the underlying principles, architectural components, loss functions, and representative algorithms, highlighting their unique mechanisms and contributions to the field. Extensive comparative evaluations are provided, including linear and semi-supervised protocols on standard benchmarks such as ImageNet, as well as transfer learning performance across diverse downstream tasks. The review also discusses theoretical foundations, scalability, efficiency, and practical challenges, such as computational demands and accessibility. By synthesizing recent advancements and identifying key trends, open challenges, and future research directions, this work serves as a valuable resource for researchers and practitioners aiming to leverage discriminative SSL for robust and generalizable computer vision models.
