Table of Contents
Fetching ...

Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP)

Huanran Li, Manh Nguyen, Daniel Pimentel-Alarcón

TL;DR

This paper proposes CLOP, a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of orthogonal linear subspaces among class embeddings, which focuses on subspace separation, leading to more distinguishable embeddings.

Abstract

Contrastive learning has emerged as a powerful method in deep learning, excelling at learning effective representations through contrasting samples from different distributions. However, neural collapse, where embeddings converge into a lower-dimensional space, poses a significant challenge, especially in semi-supervised and self-supervised setups. In this paper, we first theoretically analyze the effect of large learning rates on contrastive losses that solely rely on the cosine similarity metric, and derive a theoretical bound to mitigate this collapse. {Building on these insights, we propose CLOP, a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of orthogonal linear subspaces among class embeddings.} Unlike prior approaches that enforce a simplex ETF structure, CLOP focuses on subspace separation, leading to more distinguishable embeddings. Through extensive experiments on real and synthetic datasets, we demonstrate that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.

Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP)

TL;DR

This paper proposes CLOP, a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of orthogonal linear subspaces among class embeddings, which focuses on subspace separation, leading to more distinguishable embeddings.

Abstract

Contrastive learning has emerged as a powerful method in deep learning, excelling at learning effective representations through contrasting samples from different distributions. However, neural collapse, where embeddings converge into a lower-dimensional space, poses a significant challenge, especially in semi-supervised and self-supervised setups. In this paper, we first theoretically analyze the effect of large learning rates on contrastive losses that solely rely on the cosine similarity metric, and derive a theoretical bound to mitigate this collapse. {Building on these insights, we propose CLOP, a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of orthogonal linear subspaces among class embeddings.} Unlike prior approaches that enforce a simplex ETF structure, CLOP focuses on subspace separation, leading to more distinguishable embeddings. Through extensive experiments on real and synthetic datasets, we demonstrate that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.
Paper Structure (16 sections, 3 theorems, 28 equations, 7 figures, 3 tables)

This paper contains 16 sections, 3 theorems, 28 equations, 7 figures, 3 tables.

Key Result

Lemma 1

Let $\mathcal{F}: \mathbb{R}^{m} \to \mathbb{R}^{m'}$ be a family of Contrastive Learning structures, where $m$ and $m'$ denote the dimensions of the inputs and embeddings, respectively. If a function $f \in \mathcal{F}$ is trained using the InfoNCE loss, then there exist infinitely many local minim

Figures (7)

  • Figure 1: Illustration of global optima for InfoNCE and CLOP (this paper). For InfoNCE, global optima are reached when the model merges samples of the same class into a single embedding, whether the class arrangement is ETF (A) or co-linear (B). In contrast, the proposed CLOP introduces a novel regularizer that encourages embeddings to occupy a highly separable, full-rank space.
  • Figure 2: Illustration of the effect of repulsive force in contrastive learning. Light blue dots represent the individual class embedding, while the dark blue dot represents the mean of all class embeddings.
  • Figure 3: Numerical experiment conducted on tightness of Theorem \ref{['theory-lrup']}. Left & Middle: Singular value spectra of $\mathbf{X}$ at different training epochs (color-coded from blue to red). The Left panel shows successful optimization with a learning rate of $2.0$, while the Middle panel demonstrates optimization failure (complete collapse) at a learning rate of $2.1$. Right: The maximum learning rates preventing collapse over 5 consecutive trials, for varying class embedding sizes, are plotted against the theoretical upper bound ($\varepsilon = 0$) from Theorem \ref{['theory-lrup']}.
  • Figure 4: Numerical experiments illustrating dimensional collapse with $k$ class embeddings. The results show that minimizing total cosine similarity via gradient descent leads to convergence within a subspace of rank $k-1$ (left), while failing to preserve equal distances between the vectors (right).
  • Figure 5: Impact of avoiding dimensional collapse with CLOP (proposed method) on InfoNCE for contrastive learning. A 3-layer FFN is trained on synthetic data with 10% labeled samples, and the output embeddings are visualized in 3D. KNN classification accuracy ($k=5$) is reported, where the model is trained on 10% labeled data and tested on the remaining unlabeled data.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Lemma 1
  • Theorem 1
  • Lemma 2
  • proof : Proof of Theorem \ref{['subspace-embeddings']}
  • proof
  • proof