Table of Contents
Fetching ...

Understanding InfoNCE: Transition Probability Matrix Induced Feature Clustering

Ge Cheng, Shuo Wang, Yun Zhang

TL;DR

This work advances the theoretical understanding of InfoNCE by introducing an explicit feature space and a transition probability matrix (TPM) to model augmentation dynamics, showing that InfoNCE drives the empirical co-occurrence probability $\mathbb{P}_{ij}$ toward a constant target defined by the TPM. Building on this, the authors propose Scaled Convergence InfoNCE (SC-InfoNCE), which adds tunable parameters $\delta$ and $\gamma$ to scale and bias the convergence target, enabling flexible control over feature similarity alignment. Theoretical analysis links gradient dynamics to augmentation-induced structure and demonstrates that SC-InfoNCE yields more controllable and robust representations across vision, graph, and text tasks, with empirical gains over strong baselines. By enabling task-aware scaling of the convergence target and providing a TPM-based lens on augmentation, the work offers a principled pathway for designing more stable and transferable contrastive representations. The framework also suggests practical steps for deriving task-specific sub-TPMs to guide sampling and invariance regularization in diverse domains.

Abstract

Contrastive learning has emerged as a cornerstone of unsupervised representation learning across vision, language, and graph domains, with InfoNCE as its dominant objective. Despite its empirical success, the theoretical underpinnings of InfoNCE remain limited. In this work, we introduce an explicit feature space to model augmented views of samples and a transition probability matrix to capture data augmentation dynamics. We demonstrate that InfoNCE optimizes the probability of two views sharing the same source toward a constant target defined by this matrix, naturally inducing feature clustering in the representation space. Leveraging this insight, we propose Scaled Convergence InfoNCE (SC-InfoNCE), a novel loss function that introduces a tunable convergence target to flexibly control feature similarity alignment. By scaling the target matrix, SC-InfoNCE enables flexible control over feature similarity alignment, allowing the training objective to better match the statistical properties of downstream data. Experiments on benchmark datasets, including image, graph, and text tasks, show that SC-InfoNCE consistently achieves strong and reliable performance across diverse domains.

Understanding InfoNCE: Transition Probability Matrix Induced Feature Clustering

TL;DR

This work advances the theoretical understanding of InfoNCE by introducing an explicit feature space and a transition probability matrix (TPM) to model augmentation dynamics, showing that InfoNCE drives the empirical co-occurrence probability toward a constant target defined by the TPM. Building on this, the authors propose Scaled Convergence InfoNCE (SC-InfoNCE), which adds tunable parameters and to scale and bias the convergence target, enabling flexible control over feature similarity alignment. Theoretical analysis links gradient dynamics to augmentation-induced structure and demonstrates that SC-InfoNCE yields more controllable and robust representations across vision, graph, and text tasks, with empirical gains over strong baselines. By enabling task-aware scaling of the convergence target and providing a TPM-based lens on augmentation, the work offers a principled pathway for designing more stable and transferable contrastive representations. The framework also suggests practical steps for deriving task-specific sub-TPMs to guide sampling and invariance regularization in diverse domains.

Abstract

Contrastive learning has emerged as a cornerstone of unsupervised representation learning across vision, language, and graph domains, with InfoNCE as its dominant objective. Despite its empirical success, the theoretical underpinnings of InfoNCE remain limited. In this work, we introduce an explicit feature space to model augmented views of samples and a transition probability matrix to capture data augmentation dynamics. We demonstrate that InfoNCE optimizes the probability of two views sharing the same source toward a constant target defined by this matrix, naturally inducing feature clustering in the representation space. Leveraging this insight, we propose Scaled Convergence InfoNCE (SC-InfoNCE), a novel loss function that introduces a tunable convergence target to flexibly control feature similarity alignment. By scaling the target matrix, SC-InfoNCE enables flexible control over feature similarity alignment, allowing the training objective to better match the statistical properties of downstream data. Experiments on benchmark datasets, including image, graph, and text tasks, show that SC-InfoNCE consistently achieves strong and reliable performance across diverse domains.

Paper Structure

This paper contains 30 sections, 40 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The t-SNE visualizations of representations learned by SCL, InfoNCE, f-MICL, and SC-InfoNCE (ours) on CIFAR-10 show that SC-InfoNCE yields tighter intra-class clusters and clearer inter-class separation.
  • Figure 2: The eigenvalue spectra of the feature covariance matrices for representations learned by SCL, InfoNCE, f-MICL, and SC-InfoNCE (ours) on CIFAR-10. Each curve shows the ordered eigenvalues (in descending order) of the covariance matrix computed from L2-normalized embeddings on the validation set.
  • Figure 3: Effect of scaling the convergence target on the convergence trajectory. The vertical ordering of $m_{i,j}$ entries is consistent with their theoretical counterparts $\mathbb{P}_{ij}$.
  • Figure 4: Convergence trajectories for different contrastive losses on the synthetic dataset, visualized using class confusion probabilities and class-level embedding similarities. The vertical ordering of $m_{i,j}$ entries is consistent with their theoretical counterparts $\mathbb{P}_{ij}$.
  • Figure 5: Convergence comparison under different values of $\gamma$ (normalized by batch size) with fixed $\delta=1$ on a synthetic dataset.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 1: Closure
  • Definition 2: Generating Set
  • Definition 3: Explicit Feature Space