Understanding InfoNCE: Transition Probability Matrix Induced Feature Clustering
Ge Cheng, Shuo Wang, Yun Zhang
TL;DR
This work advances the theoretical understanding of InfoNCE by introducing an explicit feature space and a transition probability matrix (TPM) to model augmentation dynamics, showing that InfoNCE drives the empirical co-occurrence probability $\mathbb{P}_{ij}$ toward a constant target defined by the TPM. Building on this, the authors propose Scaled Convergence InfoNCE (SC-InfoNCE), which adds tunable parameters $\delta$ and $\gamma$ to scale and bias the convergence target, enabling flexible control over feature similarity alignment. Theoretical analysis links gradient dynamics to augmentation-induced structure and demonstrates that SC-InfoNCE yields more controllable and robust representations across vision, graph, and text tasks, with empirical gains over strong baselines. By enabling task-aware scaling of the convergence target and providing a TPM-based lens on augmentation, the work offers a principled pathway for designing more stable and transferable contrastive representations. The framework also suggests practical steps for deriving task-specific sub-TPMs to guide sampling and invariance regularization in diverse domains.
Abstract
Contrastive learning has emerged as a cornerstone of unsupervised representation learning across vision, language, and graph domains, with InfoNCE as its dominant objective. Despite its empirical success, the theoretical underpinnings of InfoNCE remain limited. In this work, we introduce an explicit feature space to model augmented views of samples and a transition probability matrix to capture data augmentation dynamics. We demonstrate that InfoNCE optimizes the probability of two views sharing the same source toward a constant target defined by this matrix, naturally inducing feature clustering in the representation space. Leveraging this insight, we propose Scaled Convergence InfoNCE (SC-InfoNCE), a novel loss function that introduces a tunable convergence target to flexibly control feature similarity alignment. By scaling the target matrix, SC-InfoNCE enables flexible control over feature similarity alignment, allowing the training objective to better match the statistical properties of downstream data. Experiments on benchmark datasets, including image, graph, and text tasks, show that SC-InfoNCE consistently achieves strong and reliable performance across diverse domains.
