Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses

Panagiotis Koromilas; Giorgos Bouritsas; Theodoros Giannakopoulos; Mihalis Nicolaou; Yannis Panagakis

Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses

Panagiotis Koromilas, Giorgos Bouritsas, Theodoros Giannakopoulos, Mihalis Nicolaou, Yannis Panagakis

TL;DR

The paper addresses why diverse contrastive learning losses sometimes diverge in practice and how they relate to hyperspherical energy minimisation (HEM). It proves that, under both minibatch and asymptotic regimes, InfoNCE-family losses share minimisers and are tied to energy minimisation on the sphere, then introduces Decoupled Hyperspherical Energy Loss (DHEL) to separate alignment from uniformity. It further extends the analysis to Kernel Contrastive Learning (KCL), showing batch-size independence of the expected loss and identifying non-asymptotic minimisers (e.g., regular simplices, cross-polytopes) under kernel assumptions. Empirical results across CIFAR-10/100, STL-10, and ImageNet-100 demonstrate improved downstream performance, robustness to batch size and hyperparameters, and reduced dimensionality collapse, underscoring the practical value of the proposed losses and the energy-centric perspective.

Abstract

What do different contrastive learning (CL) losses actually optimize for? Although multiple CL methods have demonstrated remarkable representation learning capabilities, the differences in their inner workings remain largely opaque. In this work, we analyse several CL families and prove that, under certain conditions, they admit the same minimisers when optimizing either their batch-level objectives or their expectations asymptotically. In both cases, an intimate connection with the hyperspherical energy minimisation (HEM) problem resurfaces. Drawing inspiration from this, we introduce a novel CL objective, coined Decoupled Hyperspherical Energy Loss (DHEL). DHEL simplifies the problem by decoupling the target hyperspherical energy from the alignment of positive examples while preserving the same theoretical guarantees. Going one step further, we show the same results hold for another relevant CL family, namely kernel contrastive learning (KCL), with the additional advantage of the expected loss being independent of batch size, thus identifying the minimisers in the non-asymptotic regime. Empirical results demonstrate improved downstream performance and robustness across combinations of different batch sizes and hyperparameters and reduced dimensionality collapse, on several computer vision datasets.

Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses

TL;DR

Abstract

Paper Structure (36 sections, 13 theorems, 46 equations, 19 figures, 2 tables)

This paper contains 36 sections, 13 theorems, 46 equations, 19 figures, 2 tables.

Introduction
Related Work
Preliminaries and Notation
Contrastive Learning setup.
Reconciling Contrastive Loss Variants
Decoupled Hypershperical Energy Loss
Expected loss: What happens when the batch size is finite?
Decoupling uniformity from alignment
Theoretical properties of DHEL.
Minima of Kernel Contrastive Learning
Experimental Evaluation
Downstream performance and robustness
Ablation studies
Novel metric: Wasserstein distance between similarity distributions.
Discussion
...and 21 more sections

Key Result

Theorem 4.1

Consider the following optimisation problem: where $\mathbf{U}, \mathbf{V}$ are tuples of $M$ vectors on the unit $d-1$-sphere and $L_{\textnormal{CL-sym}}$ is the symmetric version of any of the loss functions $L_{\textnormal{a}}(\cdot, \cdot; \phi, \psi), L_{\textnormal{b}}(\cdot, \cdot; \phi, \psi)$ as defined in Eq. eq:general_losses. Furt Additionally, (4) if $\psi, \phi$ are strictly increa

Figures (19)

Figure : (a) CIFAR10
Figure : (a) Rank
Figure : (a) Alignment
Figure : (a) CIFAR10
Figure : (b) CIFAR100
...and 14 more figures

Theorems & Definitions (19)

Theorem 4.1
Corollary 4.2
Proposition 4.3
Theorem 5.1
Theorem 6.1
Proposition 6.2
Theorem 2.1
proof
Corollary 2.2
Corollary 2.3
...and 9 more

Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses

TL;DR

Abstract

Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (19)