Table of Contents
Fetching ...

Generalizing Supervised Contrastive learning: A Projection Perspective

Minoh Jeong, Alfred Hero

TL;DR

This work addresses the weak connection between supervised contrastive learning (SupCon) and mutual information (MI) by introducing ProjNCE, a projection-based generalization of the InfoNCE loss that yields a valid MI lower bound via an adjustment term. By allowing distinct projections for positives and negatives, ProjNCE unifies SupCon and InfoNCE and enables exploration of smarter class-embedding strategies beyond centroids. The authors analyze how SupCon fits within the ProjNCE framework and show that optimizing ProjNCE tighter bounds MI, with experiments across vision and audio demonstrating consistent improvements over SupCon and standard cross-entropy. They further study projection methods (orthogonal, median, and MLP-based) and provide practical estimators (e.g., Nadaraya-Watson) to approximate conditional embeddings, achieving higher MI and better downstream accuracy. Overall, ProjNCE offers a broadly applicable enhancement for supervised contrastive learning, robust to label noise and adaptable to auxiliary priors or side information through projection design.

Abstract

Self-supervised contrastive learning (SSCL) has emerged as a powerful paradigm for representation learning and has been studied from multiple perspectives, including mutual information and geometric viewpoints. However, supervised contrastive (SupCon) approaches have received comparatively little attention in this context: for instance, while InfoNCE used in SSCL is known to form a lower bound on mutual information (MI), the relationship between SupCon and MI remains unexplored. To address this gap, we introduce ProjNCE, a generalization of the InfoNCE loss that unifies supervised and self-supervised contrastive objectives by incorporating projection functions and an adjustment term for negative pairs. We prove that ProjNCE constitutes a valid MI bound and affords greater flexibility in selecting projection strategies for class embeddings. Building on this flexibility, we further explore the centroid-based class embeddings in SupCon by exploring a variety of projection methods. Extensive experiments on image and audio datasets demonstrate that ProjNCE consistently outperforms both SupCon and standard cross-entropy training. Our work thus refines SupCon along two complementary perspectives--information-theoretic and projection viewpoints--and offers broadly applicable improvements whenever SupCon serves as the foundational contrastive objective.

Generalizing Supervised Contrastive learning: A Projection Perspective

TL;DR

This work addresses the weak connection between supervised contrastive learning (SupCon) and mutual information (MI) by introducing ProjNCE, a projection-based generalization of the InfoNCE loss that yields a valid MI lower bound via an adjustment term. By allowing distinct projections for positives and negatives, ProjNCE unifies SupCon and InfoNCE and enables exploration of smarter class-embedding strategies beyond centroids. The authors analyze how SupCon fits within the ProjNCE framework and show that optimizing ProjNCE tighter bounds MI, with experiments across vision and audio demonstrating consistent improvements over SupCon and standard cross-entropy. They further study projection methods (orthogonal, median, and MLP-based) and provide practical estimators (e.g., Nadaraya-Watson) to approximate conditional embeddings, achieving higher MI and better downstream accuracy. Overall, ProjNCE offers a broadly applicable enhancement for supervised contrastive learning, robust to label noise and adaptable to auxiliary priors or side information through projection design.

Abstract

Self-supervised contrastive learning (SSCL) has emerged as a powerful paradigm for representation learning and has been studied from multiple perspectives, including mutual information and geometric viewpoints. However, supervised contrastive (SupCon) approaches have received comparatively little attention in this context: for instance, while InfoNCE used in SSCL is known to form a lower bound on mutual information (MI), the relationship between SupCon and MI remains unexplored. To address this gap, we introduce ProjNCE, a generalization of the InfoNCE loss that unifies supervised and self-supervised contrastive objectives by incorporating projection functions and an adjustment term for negative pairs. We prove that ProjNCE constitutes a valid MI bound and affords greater flexibility in selecting projection strategies for class embeddings. Building on this flexibility, we further explore the centroid-based class embeddings in SupCon by exploring a variety of projection methods. Extensive experiments on image and audio datasets demonstrate that ProjNCE consistently outperforms both SupCon and standard cross-entropy training. Our work thus refines SupCon along two complementary perspectives--information-theoretic and projection viewpoints--and offers broadly applicable improvements whenever SupCon serves as the foundational contrastive objective.

Paper Structure

This paper contains 28 sections, 5 theorems, 34 equations, 3 figures, 5 tables.

Key Result

Proposition 2.1

For any $g_+$ and $g_-$, the projection incorporated InfoNCE in eq:NCE_proj bounds mutual information as where

Figures (3)

  • Figure 1: t-SNE plots of CIFAR-10 embeddings from Resnet-18 learned with label noise of probability $0.3$. The four figures are obtained from different loss selection: (a) SupCon (b) ProjNCE (c) \ref{['eq:ProjNCE_beta']} with $\beta=5$ (d) \ref{['eq:ProjNCE_beta']} with $\beta=10$. The adjustment term in \ref{['eq:R']} forces the embedding clusters to spread out.
  • Figure 2: We estimate the mutual information $I(f({\bf X});C)$ between the learned embedding and the class label. ProjNCE attains higher estimated mutual information than SupCon (a tie on Caltech256), largely due to its use of a valid mutual-information bound.
  • Figure 3: Effect of bandwidth $h$. Results are shown for $\ell_1,\ell_2$, and cosine $(\cos)$ dissimilarities. With the exception of $(d=cos, h=0.7)$ on STL10, ProjNCE-perp is largely insensitive to kernel parameters. The sharp dip at $(d=\cos, h=0.7)$—with accuracy $<20\%$—likely reflects an outlier or convergence to a poor local optimum. Overall, ProjNCE-perp requires minimal hyperparameter tuning.

Theorems & Definitions (17)

  • Proposition 2.1
  • proof
  • Corollary 2.2
  • proof
  • Definition 2.3: ProjNCE
  • Definition 3.1: ProjNCE-perp
  • Proposition 3.2
  • proof
  • Proposition 3.3
  • proof
  • ...and 7 more