Probabilistic Variational Contrastive Learning
Minoh Jeong, Seonho Kim, Alfred Hero
TL;DR
This work introduces Variational Contrastive Learning (VCL), a decoder-free ELBO-maximization framework that converts deterministic contrastive embeddings into probabilistic ones by mapping inputs to a projected normal posterior on the unit sphere and treating InfoNCE as a surrogate reconstruction term. The approach adds a KL regularizer to a uniform sphere prior, yielding a symmetrized objective that encourages both alignment and isotropic use of embedding dimensions; two instantiations, VSimCLR and VSupCon, demonstrate the method in self-supervised and supervised settings. Theoretical connections between InfoNCE and the ELBO are established, including a generalization bound for the KL term and its favorable properties compared to MI-based bounds. Empirically, VCL mitigates dimensional collapse, preserves or improves mutual information with labels, achieves competitive classification accuracy, and provides meaningful posterior uncertainty measurements that relate to sample typicality, ambiguity, and out-of-distribution behavior, enabling uncertainty-aware decisions in downstream tasks.
Abstract
Deterministic embeddings learned by contrastive learning (CL) methods such as SimCLR and SupCon achieve state-of-the-art performance but lack a principled mechanism for uncertainty quantification. We propose Variational Contrastive Learning (VCL), a decoder-free framework that maximizes the evidence lower bound (ELBO) by interpreting the InfoNCE loss as a surrogate reconstruction term and adding a KL divergence regularizer to a uniform prior on the unit hypersphere. We model the approximate posterior $q_θ(z|x)$ as a projected normal distribution, enabling the sampling of probabilistic embeddings. Our two instantiation--VSimCLR and VSupCon--replace deterministic embeddings with samples from $q_θ(z|x)$ and incorporate a normalized KL term into the loss. Experiments on multiple benchmarks demonstrate that VCL mitigates dimensional collapse, enhances mutual information with class labels, and matches or outperforms deterministic baselines in classification accuracy, all the while providing meaningful uncertainty estimates through the posterior model. VCL thus equips contrastive learning with a probabilistic foundation, serving as a new basis for contrastive approaches.
