Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features
Tina Behnia, Christos Thrampoulidis
TL;DR
This work analyzes supervised contrastive (SC) representation learning under the unconstrained features model (UFM), showing that neural-collapse-like geometry emerges at local optima and that the optimization landscape is benign when the embedding dimension satisfies $d > k$. By reformulating SC as a convex relaxation on the Gram matrix $G=H^ op H$, the authors prove that all local minima are global and that any two global minimizers share the same implicit geometry up to rotation, with a unique global solution in the convex program. They further characterize global solutions under label imbalance: for STEP-imbalanced data, the global optimum has a structured block form, while in the balanced case the optimal geometry reduces to a simplex equiangular tight frame (ETF). These results provide a theoretical foundation for SC-based representation learning in over-parameterized networks and offer insight into how class imbalance shapes embedding geometry, motivating future work on optimization dynamics and broader data regimes.
Abstract
Recent findings reveal that over-parameterized deep neural networks, trained beyond zero training-error, exhibit a distinctive structural pattern at the final layer, termed as Neural-collapse (NC). These results indicate that the final hidden-layer outputs in such networks display minimal within-class variations over the training set. While existing research extensively investigates this phenomenon under cross-entropy loss, there are fewer studies focusing on its contrastive counterpart, supervised contrastive (SC) loss. Through the lens of NC, this paper employs an analytical approach to study the solutions derived from optimizing the SC loss. We adopt the unconstrained features model (UFM) as a representative proxy for unveiling NC-related phenomena in sufficiently over-parameterized deep networks. We show that, despite the non-convexity of SC loss minimization, all local minima are global minima. Furthermore, the minimizer is unique (up to a rotation). We prove our results by formalizing a tight convex relaxation of the UFM. Finally, through this convex formulation, we delve deeper into characterizing the properties of global solutions under label-imbalanced training data.
