Table of Contents
Fetching ...

Semi-Supervised Contrastive Learning with Orthonormal Prototypes

Huanran Li, Manh Nguyen, Daniel Pimentel-Alarcón

TL;DR

The paper addresses dimensional collapse in semi-supervised contrastive learning by identifying a critical learning-rate threshold that drives collapse under InfoNCE. It introduces CLOP, a loss that couples standard contrastive learning with a supervised term guiding embeddings toward orthonormal class prototypes, thereby increasing embedding rank and separability. Through extensive classification and transfer-learning experiments, CLOP demonstrates superior performance and stability across learning rates and batch sizes, with robust ablations validating the role of orthonormal prototypes. The work offers a practical approach to stabilize SSL representations and extend effective semi-supervised learning to vision tasks, while noting limitations in fixed prototype counts and initialization.

Abstract

Contrastive learning has emerged as a powerful method in deep learning, excelling at learning effective representations through contrasting samples from different distributions. However, dimensional collapse, where embeddings converge into a lower-dimensional space, poses a significant challenge, especially in semi-supervised and self-supervised setups. In this paper, we first identify a critical learning-rate threshold, beyond which standard contrastive losses converge to collapsed solutions. Building on these insights, we propose CLOP, a novel semi-supervised loss function designed to prevent dimensional collapse by promoting the formation of orthogonal linear subspaces among class embeddings. Through extensive experiments on real and synthetic datasets, we demonstrate that CLOP improves performance in image classification and object detection tasks while also exhibiting greater stability across different learning rates and batch sizes.

Semi-Supervised Contrastive Learning with Orthonormal Prototypes

TL;DR

The paper addresses dimensional collapse in semi-supervised contrastive learning by identifying a critical learning-rate threshold that drives collapse under InfoNCE. It introduces CLOP, a loss that couples standard contrastive learning with a supervised term guiding embeddings toward orthonormal class prototypes, thereby increasing embedding rank and separability. Through extensive classification and transfer-learning experiments, CLOP demonstrates superior performance and stability across learning rates and batch sizes, with robust ablations validating the role of orthonormal prototypes. The work offers a practical approach to stabilize SSL representations and extend effective semi-supervised learning to vision tasks, while noting limitations in fixed prototype counts and initialization.

Abstract

Contrastive learning has emerged as a powerful method in deep learning, excelling at learning effective representations through contrasting samples from different distributions. However, dimensional collapse, where embeddings converge into a lower-dimensional space, poses a significant challenge, especially in semi-supervised and self-supervised setups. In this paper, we first identify a critical learning-rate threshold, beyond which standard contrastive losses converge to collapsed solutions. Building on these insights, we propose CLOP, a novel semi-supervised loss function designed to prevent dimensional collapse by promoting the formation of orthogonal linear subspaces among class embeddings. Through extensive experiments on real and synthetic datasets, we demonstrate that CLOP improves performance in image classification and object detection tasks while also exhibiting greater stability across different learning rates and batch sizes.

Paper Structure

This paper contains 15 sections, 1 theorem, 41 equations, 4 figures, 8 tables.

Key Result

Lemma 1

Let $\mathcal{F}: \mathbb{R}^{m} \to \mathbb{R}^{m'}$ be a family of Contrastive Learning structures, where $m$ and $m'$ denote the dimensions of the inputs and embeddings, respectively. If a function $f \in \mathcal{F}$ is trained using the InfoNCE loss, then there exist infinitely many local stati

Figures (4)

  • Figure 1: Simulation with Repulsive Force on 50 simulated points in 50-dimensional space.
  • Figure 2: Top-1 classification accuracy on ImageNet across different learning rates and batch sizes. The percentage of labels used for supervised training is indicated in the legend.
  • Figure 3: Top-1 classification accuracy across different learning rates. The percentage of labels used for supervised training is indicated in the legend.
  • Figure 4: Top-1 classification accuracy across different batch sizes. The percentage of labels used for supervised training is indicated in the legend.

Theorems & Definitions (2)

  • Lemma 1
  • proof : Proof of Lemma \ref{['subspace-embeddings']}