Generalizing Supervised Contrastive learning: A Projection Perspective
Minoh Jeong, Alfred Hero
TL;DR
This work addresses the weak connection between supervised contrastive learning (SupCon) and mutual information (MI) by introducing ProjNCE, a projection-based generalization of the InfoNCE loss that yields a valid MI lower bound via an adjustment term. By allowing distinct projections for positives and negatives, ProjNCE unifies SupCon and InfoNCE and enables exploration of smarter class-embedding strategies beyond centroids. The authors analyze how SupCon fits within the ProjNCE framework and show that optimizing ProjNCE tighter bounds MI, with experiments across vision and audio demonstrating consistent improvements over SupCon and standard cross-entropy. They further study projection methods (orthogonal, median, and MLP-based) and provide practical estimators (e.g., Nadaraya-Watson) to approximate conditional embeddings, achieving higher MI and better downstream accuracy. Overall, ProjNCE offers a broadly applicable enhancement for supervised contrastive learning, robust to label noise and adaptable to auxiliary priors or side information through projection design.
Abstract
Self-supervised contrastive learning (SSCL) has emerged as a powerful paradigm for representation learning and has been studied from multiple perspectives, including mutual information and geometric viewpoints. However, supervised contrastive (SupCon) approaches have received comparatively little attention in this context: for instance, while InfoNCE used in SSCL is known to form a lower bound on mutual information (MI), the relationship between SupCon and MI remains unexplored. To address this gap, we introduce ProjNCE, a generalization of the InfoNCE loss that unifies supervised and self-supervised contrastive objectives by incorporating projection functions and an adjustment term for negative pairs. We prove that ProjNCE constitutes a valid MI bound and affords greater flexibility in selecting projection strategies for class embeddings. Building on this flexibility, we further explore the centroid-based class embeddings in SupCon by exploring a variety of projection methods. Extensive experiments on image and audio datasets demonstrate that ProjNCE consistently outperforms both SupCon and standard cross-entropy training. Our work thus refines SupCon along two complementary perspectives--information-theoretic and projection viewpoints--and offers broadly applicable improvements whenever SupCon serves as the foundational contrastive objective.
