Table of Contents
Fetching ...

SinSim: Sinkhorn-Regularized SimCLR

M. Hadi Sepanj, Paul Fiegth

TL;DR

SinSim addresses the lack of global geometric regularization in self-supervised contrastive learning by integrating Sinkhorn regularization into the SimCLR framework. It adds an entropy-regularized optimal transport objective on intermediate representations $h$ while preserving standard contrastive learning on final embeddings $z$, yielding a geometry-aware latent space. The paper provides theoretical justification for dispersion and demonstrates improved linear and nonlinear classification performance across MNIST, CIFAR-10/100, and STL-10, complemented by UMAP visualizations showing clearer class separation. These results suggest that transport-based regularization is a viable tool to produce robust, well-structured representations and can be extended to larger-scale and multimodal settings.

Abstract

Self-supervised learning has revolutionized representation learning by eliminating the need for labeled data. Contrastive learning methods, such as SimCLR, maximize the agreement between augmented views of an image but lack explicit regularization to enforce a globally structured latent space. This limitation often leads to suboptimal generalization. We propose SinSim, a novel extension of SimCLR that integrates Sinkhorn regularization from optimal transport theory to enhance representation structure. The Sinkhorn loss, an entropy-regularized Wasserstein distance, encourages a well-dispersed and geometry-aware feature space, preserving discriminative power. Empirical evaluations on various datasets demonstrate that SinSim outperforms SimCLR and achieves competitive performance against prominent self-supervised methods such as VICReg and Barlow Twins. UMAP visualizations further reveal improved class separability and structured feature distributions. These results indicate that integrating optimal transport regularization into contrastive learning provides a principled and effective mechanism for learning robust, well-structured representations. Our findings open new directions for applying transport-based constraints in self-supervised learning frameworks.

SinSim: Sinkhorn-Regularized SimCLR

TL;DR

SinSim addresses the lack of global geometric regularization in self-supervised contrastive learning by integrating Sinkhorn regularization into the SimCLR framework. It adds an entropy-regularized optimal transport objective on intermediate representations while preserving standard contrastive learning on final embeddings , yielding a geometry-aware latent space. The paper provides theoretical justification for dispersion and demonstrates improved linear and nonlinear classification performance across MNIST, CIFAR-10/100, and STL-10, complemented by UMAP visualizations showing clearer class separation. These results suggest that transport-based regularization is a viable tool to produce robust, well-structured representations and can be extended to larger-scale and multimodal settings.

Abstract

Self-supervised learning has revolutionized representation learning by eliminating the need for labeled data. Contrastive learning methods, such as SimCLR, maximize the agreement between augmented views of an image but lack explicit regularization to enforce a globally structured latent space. This limitation often leads to suboptimal generalization. We propose SinSim, a novel extension of SimCLR that integrates Sinkhorn regularization from optimal transport theory to enhance representation structure. The Sinkhorn loss, an entropy-regularized Wasserstein distance, encourages a well-dispersed and geometry-aware feature space, preserving discriminative power. Empirical evaluations on various datasets demonstrate that SinSim outperforms SimCLR and achieves competitive performance against prominent self-supervised methods such as VICReg and Barlow Twins. UMAP visualizations further reveal improved class separability and structured feature distributions. These results indicate that integrating optimal transport regularization into contrastive learning provides a principled and effective mechanism for learning robust, well-structured representations. Our findings open new directions for applying transport-based constraints in self-supervised learning frameworks.

Paper Structure

This paper contains 18 sections, 2 theorems, 20 equations, 7 figures, 2 tables.

Key Result

Lemma 1

Let $P$ and $Q$ be defined as above, and denote by $\gamma^*$ the optimal coupling in eq:W_lambda. Then, In particular, as $\lambda\to 0$, minimizing $W_\lambda(P,Q)$ forces the learned representations to align (by the"diagonal" cost) while still controlling dispersion in the latent space.

Figures (7)

  • Figure 1: UMAP visualization of learned embeddings. (a) and (c) show the embedding space for SimCLR, while (b) and (d) illustrate SinSim's feature representations. SinSim achieves better class separation and reduced overlap, suggesting that Sinkhorn regularization improves the structured alignment of representations.
  • Figure 2: Effect of Sinkhorn Regularization on SimSim Performance, varying the Sinkhorn regularization strength ($\beta$) on classification accuracy when training on MNIST for 10 epochs. The solid orange curve represents the classification accuracy achieved by the SinSim model as a function of $\beta$, while the dashed red line corresponds to the baseline performance of standard SimCLR (equivalent to SinSim at $\beta = 0$). Very clearly, incorporating Sinkhorn regularization consistently improves feature representations, leading to enhanced classification accuracy. This preliminary experiment primarily serves to capture the trend of Sinkhorn's influence on contrastive learning rather than as an absolute performance benchmark.
  • Figure 3: As in Figure \ref{['fig:sinkhorn_simclr']}, but here based on CIFAR-10 data. Unlike the consistent improvement observed in Figure \ref{['fig:sinkhorn_simclr']} with MNIST, here the results based on CIFAR-10 exhibit significant fluctuations with $\beta$, with a more modest trend. Nevertheless, for all values of $\beta$ SinSim outperforms the baseline, suggesting that Sinkhorn regularization definitely contributes to enhanced feature representations. This experiment primarily serves to capture the trend of Sinkhorn's influence on contrastive learning in a more complex dataset.
  • Figure 4: Effect of Sinkhorn iterations on SinSim classification accuracy on MNIST. Increasing the number of iterations improves accuracy up to 40, beyond which performance slightly decreases. The red dashed line represents a baseline SinSim performance at a default iteration count of 10.
  • Figure 5: Effect of Sinkhorn iterations, as in Figure \ref{['fig:n_iter']}, but now assessed on CIFAR-10. The overall trend is very similar to that of MNIST, with slightly increased variability, but consistent overall conclusion.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Lemma 1: Latent Space Dispersion
  • proof
  • Lemma 2: Prevention of Mode Collapse
  • proof