Your contrastive learning problem is secretly a distribution alignment problem
Zihao Chen, Chi-Heng Lin, Ran Liu, Jingyun Xiao, Eva L Dyer
TL;DR
This work reframes contrastive learning as a distribution alignment problem by casting it as a transport problem between augmented views. It introduces Generalized Contrastive Alignment (GCA), which uses target transport plans and proximal-point updates to flexibly encode matching constraints via (unbalanced) optimal transport, improving alignment and uniformity of learned representations. The authors establish connections between GCA and existing CL objectives (INCE, RINCE, BYOL), provide convergence and complexity analysis, and demonstrate empirical gains on standard and corrupted augmentations as well as domain generalization, with GCA-UOT often delivering the best performance. The framework enables incorporating domain knowledge and robust alignment strategies into self-supervised learning, suggesting broad applicability beyond standard image domains. Theoretical results link iterative GCA updates to tighter alignment bounds and higher latent-space uniformity, which translate into improved downstream classification performance. Overall, GCA offers a principled, scalable path to more robust, domain-aware self-supervised representations.
Abstract
Despite the success of contrastive learning (CL) in vision and language, its theoretical foundations and mechanisms for building representations remain poorly understood. In this work, we build connections between noise contrastive estimation losses widely used in CL and distribution alignment with entropic optimal transport (OT). This connection allows us to develop a family of different losses and multistep iterative variants for existing CL methods. Intuitively, by using more information from the distribution of latents, our approach allows a more distribution-aware manipulation of the relationships within augmented sample sets. We provide theoretical insights and experimental evidence demonstrating the benefits of our approach for {\em generalized contrastive alignment}. Through this framework, it is possible to leverage tools in OT to build unbalanced losses to handle noisy views and customize the representation space by changing the constraints on alignment. By reframing contrastive learning as an alignment problem and leveraging existing optimization tools for OT, our work provides new insights and connections between different self-supervised learning models in addition to new tools that can be more easily adapted to incorporate domain knowledge into learning.
