Table of Contents
Fetching ...

Variational Self-Supervised Contrastive Learning Using Beta Divergence

Mehmet Can Yavuz, Berrin Yanikoglu

TL;DR

The paper tackles learning robust representations from unlabeled, noisy multi-label data by proposing Variational Contrastive Learning (VCL), which combines a variational encoder with a beta-divergence-based objective. The method introduces a Gaussian sampling head and three loss terms, collectively forming the total objective $\mathcal{L}_{total}$ that enforces discriminative clustering while regularizing the latent distributions. Empirically, VCL outperforms state-of-the-art self-supervised methods on CelebA and the noisy YFCC-CelebA dataset in linear and low-data regimes, including scenarios where pretraining on noisy web data yields higher downstream accuracy. The approach demonstrates robust, scalable pretraining for multi-label face attribute recognition in real-world, noisy data settings, with notable improvements under limited labeled data.

Abstract

Learning a discriminative semantic space using unlabelled and noisy data remains unaddressed in a multi-label setting. We present a contrastive self-supervised learning method which is robust to data noise, grounded in the domain of variational methods. The method (VCL) utilizes variational contrastive learning with beta-divergence to learn robustly from unlabelled datasets, including uncurated and noisy datasets. We demonstrate the effectiveness of the proposed method through rigorous experiments including linear evaluation and fine-tuning scenarios with multi-label datasets in the face understanding domain. In almost all tested scenarios, VCL surpasses the performance of state-of-the-art self-supervised methods, achieving a noteworthy increase in accuracy.

Variational Self-Supervised Contrastive Learning Using Beta Divergence

TL;DR

The paper tackles learning robust representations from unlabeled, noisy multi-label data by proposing Variational Contrastive Learning (VCL), which combines a variational encoder with a beta-divergence-based objective. The method introduces a Gaussian sampling head and three loss terms, collectively forming the total objective that enforces discriminative clustering while regularizing the latent distributions. Empirically, VCL outperforms state-of-the-art self-supervised methods on CelebA and the noisy YFCC-CelebA dataset in linear and low-data regimes, including scenarios where pretraining on noisy web data yields higher downstream accuracy. The approach demonstrates robust, scalable pretraining for multi-label face attribute recognition in real-world, noisy data settings, with notable improvements under limited labeled data.

Abstract

Learning a discriminative semantic space using unlabelled and noisy data remains unaddressed in a multi-label setting. We present a contrastive self-supervised learning method which is robust to data noise, grounded in the domain of variational methods. The method (VCL) utilizes variational contrastive learning with beta-divergence to learn robustly from unlabelled datasets, including uncurated and noisy datasets. We demonstrate the effectiveness of the proposed method through rigorous experiments including linear evaluation and fine-tuning scenarios with multi-label datasets in the face understanding domain. In almost all tested scenarios, VCL surpasses the performance of state-of-the-art self-supervised methods, achieving a noteworthy increase in accuracy.
Paper Structure (17 sections, 17 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 17 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Diagram of our proposed model from left to right. First two augmentations $x_i$ and $x_j$ are obtained from the input and their latent vectors ($h_i$ and $h_j$) are extracted. Then the Gaussian sampling head learns the distribution parameters (mean and log variance) and samples from the learned distribution. Light color boxes indicate the same operation or the shared weights.
  • Figure 2: Four different augmentations for contrastive design.
  • Figure 3: Random samples from the YFCC-CelebA dataset yavuz2021yfcc
  • Figure 4: Comparative performance of self-supervised pre-training models in multilabel tasks using CelebA and YFCC-CelebA datasets. The bar charts differentiate algorithms via color: orange represents VCL algorithms, turquoise represents state-of-the-art models, and blue bars denote supervised baselines.