Probabilistic Contrastive Learning with Explicit Concentration on the Hypersphere
Hongwei Bran Li, Cheng Ouyang, Tamaz Amiranashvili, Matthew S. Rosen, Bjoern Menze, Juan Eugenio Iglesias
TL;DR
The paper tackles the lack of explicit uncertainty estimation in self-supervised contrastive learning by placing representations on the unit hypersphere and modeling them with a von Mises-Fisher distribution. It introduces an unnormalized form $\psi(\boldsymbol{x}; \boldsymbol{\mu}, \kappa) = \exp(\kappa \boldsymbol{\mu}^T \boldsymbol{x})$ and a learnable concentration parameter $\kappa$ as a direct measure of uncertainty, regulated by an $\ell_2$ penalty to avoid degenerate solutions. A probabilistic embedding alignment loss is proposed, $L_{align} = - \lambda_{align} (\kappa_1 + \kappa_2) \boldsymbol{\mu}_1^T \boldsymbol{\mu}_2$, which is combined with the standard SimCLR objective to yield a total loss that preserves discriminativeness while encoding uncertainty. Empirical results on CIFAR-10-C demonstrate that $\kappa$ tracks corruption severity and enables failure analysis, and concatenating $\kappa$ with features improves OOD detection across several benchmarks. The approach is compatible with multiple contrastive frameworks, offering a scalable path to uncertainty-aware SSL applicable to high-stakes domains like autonomous driving and medical imaging.
Abstract
Self-supervised contrastive learning has predominantly adopted deterministic methods, which are not suited for environments characterized by uncertainty and noise. This paper introduces a new perspective on incorporating uncertainty into contrastive learning by embedding representations within a spherical space, inspired by the von Mises-Fisher distribution (vMF). We introduce an unnormalized form of vMF and leverage the concentration parameter, kappa, as a direct, interpretable measure to quantify uncertainty explicitly. This approach not only provides a probabilistic interpretation of the embedding space but also offers a method to calibrate model confidence against varying levels of data corruption and characteristics. Our empirical results demonstrate that the estimated concentration parameter correlates strongly with the degree of unforeseen data corruption encountered at test time, enables failure analysis, and enhances existing out-of-distribution detection methods.
