Table of Contents
Fetching ...

Probabilistic Variational Contrastive Learning

Minoh Jeong, Seonho Kim, Alfred Hero

TL;DR

This work introduces Variational Contrastive Learning (VCL), a decoder-free ELBO-maximization framework that converts deterministic contrastive embeddings into probabilistic ones by mapping inputs to a projected normal posterior on the unit sphere and treating InfoNCE as a surrogate reconstruction term. The approach adds a KL regularizer to a uniform sphere prior, yielding a symmetrized objective that encourages both alignment and isotropic use of embedding dimensions; two instantiations, VSimCLR and VSupCon, demonstrate the method in self-supervised and supervised settings. Theoretical connections between InfoNCE and the ELBO are established, including a generalization bound for the KL term and its favorable properties compared to MI-based bounds. Empirically, VCL mitigates dimensional collapse, preserves or improves mutual information with labels, achieves competitive classification accuracy, and provides meaningful posterior uncertainty measurements that relate to sample typicality, ambiguity, and out-of-distribution behavior, enabling uncertainty-aware decisions in downstream tasks.

Abstract

Deterministic embeddings learned by contrastive learning (CL) methods such as SimCLR and SupCon achieve state-of-the-art performance but lack a principled mechanism for uncertainty quantification. We propose Variational Contrastive Learning (VCL), a decoder-free framework that maximizes the evidence lower bound (ELBO) by interpreting the InfoNCE loss as a surrogate reconstruction term and adding a KL divergence regularizer to a uniform prior on the unit hypersphere. We model the approximate posterior $q_θ(z|x)$ as a projected normal distribution, enabling the sampling of probabilistic embeddings. Our two instantiation--VSimCLR and VSupCon--replace deterministic embeddings with samples from $q_θ(z|x)$ and incorporate a normalized KL term into the loss. Experiments on multiple benchmarks demonstrate that VCL mitigates dimensional collapse, enhances mutual information with class labels, and matches or outperforms deterministic baselines in classification accuracy, all the while providing meaningful uncertainty estimates through the posterior model. VCL thus equips contrastive learning with a probabilistic foundation, serving as a new basis for contrastive approaches.

Probabilistic Variational Contrastive Learning

TL;DR

This work introduces Variational Contrastive Learning (VCL), a decoder-free ELBO-maximization framework that converts deterministic contrastive embeddings into probabilistic ones by mapping inputs to a projected normal posterior on the unit sphere and treating InfoNCE as a surrogate reconstruction term. The approach adds a KL regularizer to a uniform sphere prior, yielding a symmetrized objective that encourages both alignment and isotropic use of embedding dimensions; two instantiations, VSimCLR and VSupCon, demonstrate the method in self-supervised and supervised settings. Theoretical connections between InfoNCE and the ELBO are established, including a generalization bound for the KL term and its favorable properties compared to MI-based bounds. Empirically, VCL mitigates dimensional collapse, preserves or improves mutual information with labels, achieves competitive classification accuracy, and provides meaningful posterior uncertainty measurements that relate to sample typicality, ambiguity, and out-of-distribution behavior, enabling uncertainty-aware decisions in downstream tasks.

Abstract

Deterministic embeddings learned by contrastive learning (CL) methods such as SimCLR and SupCon achieve state-of-the-art performance but lack a principled mechanism for uncertainty quantification. We propose Variational Contrastive Learning (VCL), a decoder-free framework that maximizes the evidence lower bound (ELBO) by interpreting the InfoNCE loss as a surrogate reconstruction term and adding a KL divergence regularizer to a uniform prior on the unit hypersphere. We model the approximate posterior as a projected normal distribution, enabling the sampling of probabilistic embeddings. Our two instantiation--VSimCLR and VSupCon--replace deterministic embeddings with samples from and incorporate a normalized KL term into the loss. Experiments on multiple benchmarks demonstrate that VCL mitigates dimensional collapse, enhances mutual information with class labels, and matches or outperforms deterministic baselines in classification accuracy, all the while providing meaningful uncertainty estimates through the posterior model. VCL thus equips contrastive learning with a probabilistic foundation, serving as a new basis for contrastive approaches.

Paper Structure

This paper contains 64 sections, 9 theorems, 81 equations, 13 figures, 8 tables.

Key Result

Lemma 3.1

Let ${\boldsymbol x}$ and ${\boldsymbol z}$ be conditionally independent given ${\boldsymbol z}’$. Then, the reconstruction term in Section subsec:elbo is bounded as where const. is independent of ${\boldsymbol z}$.

Figures (13)

  • Figure 1: Graphical illustration of SimCLR and Variational SimCLR (VSimCLR).
  • Figure 2: Embedding visualization for SimCLR and VSimCLR on CIFAR-10 test set. (a) t-SNE of SimCLR; (b) t-SNE of VSimCLR; (c) UMAP of SimCLR; (d) UMAP of VSimCLR. VSimCLR preserves the characteristic cluster structure of contrastive learning while introducing probabilistic embeddings regularized by \ref{['eq:upper_kl']}.
  • Figure 3: Singular‐value spectrum of the embedding covariance on CIFAR‐10/100. VSimCLR mitigates dimensional collapse.
  • Figure 4: Estimate of $I({\boldsymbol z};{\boldsymbol c})$.
  • Figure 5: Sample images from the CIFAR-10, organized by class (columns) and sorted by their corresponding $\log {\hbox{det}}(K)$ (rows). In each column, the top image has the highest $\log {\hbox{det}}(K)$, the bottom image the lowest; the overlaid numbers indicate each image’s $\log {\hbox{det}}(K)$.
  • ...and 8 more figures

Theorems & Definitions (13)

  • Lemma 3.1
  • proof
  • Proposition 3.2
  • Theorem 3.3: Informal
  • proof
  • proof
  • Theorem B.1
  • Corollary B.2
  • Lemma B.3: Rademacher generalization bound
  • Lemma B.4: Dudley’s entropy‐integral bound
  • ...and 3 more