Variational Self-Supervised Contrastive Learning Using Beta Divergence
Mehmet Can Yavuz, Berrin Yanikoglu
TL;DR
The paper tackles learning robust representations from unlabeled, noisy multi-label data by proposing Variational Contrastive Learning (VCL), which combines a variational encoder with a beta-divergence-based objective. The method introduces a Gaussian sampling head and three loss terms, collectively forming the total objective $\mathcal{L}_{total}$ that enforces discriminative clustering while regularizing the latent distributions. Empirically, VCL outperforms state-of-the-art self-supervised methods on CelebA and the noisy YFCC-CelebA dataset in linear and low-data regimes, including scenarios where pretraining on noisy web data yields higher downstream accuracy. The approach demonstrates robust, scalable pretraining for multi-label face attribute recognition in real-world, noisy data settings, with notable improvements under limited labeled data.
Abstract
Learning a discriminative semantic space using unlabelled and noisy data remains unaddressed in a multi-label setting. We present a contrastive self-supervised learning method which is robust to data noise, grounded in the domain of variational methods. The method (VCL) utilizes variational contrastive learning with beta-divergence to learn robustly from unlabelled datasets, including uncurated and noisy datasets. We demonstrate the effectiveness of the proposed method through rigorous experiments including linear evaluation and fine-tuning scenarios with multi-label datasets in the face understanding domain. In almost all tested scenarios, VCL surpasses the performance of state-of-the-art self-supervised methods, achieving a noteworthy increase in accuracy.
