Adv-SSL: Adversarial Self-Supervised Representation Learning with Theoretical Guarantees
Chenguang Duan, Yuling Jiao, Huazhen Lin, Wensen Ma, Jerry Zhijian Yang
TL;DR
Adv-SSL addresses the bias inherent in covariance-regularized self-supervised learning by replacing the biased estimator with a minimax formulation that yields an unbiased end-to-end transfer guarantee. The method learns representations via a min-max objective that couples an alignment term with a regularizer and is optimized by alternating updates with a detach trick, incurring negligible extra cost. The authors prove that, with sufficient unlabeled upstream data and robust augmentations, the learned embedding forms well-separated clusters, enabling strong downstream classification even with limited labels. Empirically, Adv-SSL outperforms prior biased methods on CIFAR-10/100 and Tiny ImageNet, and the theory clarifies how unlabeled data and augmentation quality drive few-shot performance.
Abstract
Learning transferable data representations from abundant unlabeled data remains a central challenge in machine learning. Although numerous self-supervised learning methods have been proposed to address this challenge, a significant class of these approaches aligns the covariance or correlation matrix with the identity matrix. Despite impressive performance across various downstream tasks, these methods often suffer from biased sample risk, leading to substantial optimization shifts in mini-batch settings and complicating theoretical analysis. In this paper, we introduce a novel \underline{\bf Adv}ersarial \underline{\bf S}elf-\underline{\bf S}upervised Representation \underline{\bf L}earning (Adv-SSL) for unbiased transfer learning with no additional cost compared to its biased counterparts. Our approach not only outperforms the existing methods across multiple benchmark datasets but is also supported by comprehensive end-to-end theoretical guarantees. Our analysis reveals that the minimax optimization in Adv-SSL encourages representations to form well-separated clusters in the embedding space, provided there is sufficient upstream unlabeled data. As a result, our method achieves strong classification performance even with limited downstream labels, shedding new light on few-shot learning.
