Table of Contents
Fetching ...

T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning

Julie Mordacq, David Loiseaux, Vicky Kalogeiton, Steve Oudot

TL;DR

This paper tackles dimensional collapse and non-uniformity in self-supervised learning by introducing T-REGS, a simple, GPU-friendly regularization that maximizes the length of the minimum spanning tree (MST) of embeddings while constraining them to a compact manifold (a unit sphere). Theoretical results connect MST length to entropy on Riemannian manifolds, showing that MST-based optimization promotes uniformity and prevents collapse, with distinct analyses for small and large sample regimes. T-REGS extends this idea to SSL by applying MST-based regularization to each SSL branch, either standalone or as an auxiliary term to existing objectives, and demonstrates competitive performance on standard JE-SSL benchmarks and improved cross-modal retrieval in CLIP-style settings. The findings indicate that MST regularization offers a principled, scalable route to richer, more uniformly distributed representations, with practical benefits for both unimodal and multimodal learning tasks.

Abstract

Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data, often by enforcing invariance to input transformations such as rotations or blurring. Recent studies have highlighted two pivotal properties for effective representations: (i) avoiding dimensional collapse-where the learned features occupy only a low-dimensional subspace, and (ii) enhancing uniformity of the induced distribution. In this work, we introduce T-REGS, a simple regularization framework for SSL based on the length of the Minimum Spanning Tree (MST) over the learned representation. We provide theoretical analysis demonstrating that T-REGS simultaneously mitigates dimensional collapse and promotes distribution uniformity on arbitrary compact Riemannian manifolds. Several experiments on synthetic data and on classical SSL benchmarks validate the effectiveness of our approach at enhancing representation quality.

T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning

TL;DR

This paper tackles dimensional collapse and non-uniformity in self-supervised learning by introducing T-REGS, a simple, GPU-friendly regularization that maximizes the length of the minimum spanning tree (MST) of embeddings while constraining them to a compact manifold (a unit sphere). Theoretical results connect MST length to entropy on Riemannian manifolds, showing that MST-based optimization promotes uniformity and prevents collapse, with distinct analyses for small and large sample regimes. T-REGS extends this idea to SSL by applying MST-based regularization to each SSL branch, either standalone or as an auxiliary term to existing objectives, and demonstrates competitive performance on standard JE-SSL benchmarks and improved cross-modal retrieval in CLIP-style settings. The findings indicate that MST regularization offers a principled, scalable route to richer, more uniformly distributed representations, with practical benefits for both unimodal and multimodal learning tasks.

Abstract

Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data, often by enforcing invariance to input transformations such as rotations or blurring. Recent studies have highlighted two pivotal properties for effective representations: (i) avoiding dimensional collapse-where the learned features occupy only a low-dimensional subspace, and (ii) enhancing uniformity of the induced distribution. In this work, we introduce T-REGS, a simple regularization framework for SSL based on the length of the Minimum Spanning Tree (MST) over the learned representation. We provide theoretical analysis demonstrating that T-REGS simultaneously mitigates dimensional collapse and promotes distribution uniformity on arbitrary compact Riemannian manifolds. Several experiments on synthetic data and on classical SSL benchmarks validate the effectiveness of our approach at enhancing representation quality.

Paper Structure

This paper contains 41 sections, 6 theorems, 26 equations, 8 figures, 9 tables, 1 algorithm.

Key Result

Theorem 4.1

Under the above conditions, the maximum of $E\left(\mathrm{MST}{}{\left(X\right)}\right)$ over the point sets $X\subset{B}$ of fixed cardinality $n$ is attained when the points of $X$ lie on the sphere ${S}=\partial{B}$, at the vertices of a regular $(n-1)$-simplex that has ${S}$ as its smallest cir

Figures (8)

  • Figure 1: Overview of T-REGS.(Left) Two augmented views $X, X'$ are encoded by $f_\theta$ and projected by $h_\phi$ into embeddings $Z, Z'$. Training jointly: (i) minimizes the Mean Squared Error, $\mathcal{L}_\text{MSE}(Z,Z')$, to enforce view invariance (or alternatively the objective function of a given SSL method, $\mathcal{L}_\text{SSL}(Z,Z')$, when used as an auxiliary term); (ii) maximizes the minimum-spanning-tree length on each branch, $\mathcal{L}_\mathrm{E}(Z)$ and $\mathcal{L}_\mathrm{E}(Z')$, repelling edge-connected points in $\mathrm{MST}(Z)$ and $\mathrm{MST}(Z')$; and (iii) applies sphere constraints $\mathcal{L}_\mathrm{S}(Z)$ and $\mathcal{L}_\mathrm{S}(Z')$. (Right) As a result, T-REGS induces uniformly distributed embeddings without dimensional collapse.
  • Figure 2: Illustration of T-REG with synthetic data.(a-c) $3$-d point cloud analysis: (a) T-REG successfully spreads points uniformly on the sphere by combining $\mathrm{MST}$ length maximization and sphere constraint, (b) using only MST length maximization leads to excessive dilation, (c) stable convergence of T-REG whereas $\mathcal{L}_\text{E}$ alone fails to converge. (d-e) Higher-dimensional analysis ($256$-d): (d) T-REG enforces effective convergence to the 255-d regular simplex (\ref{['prop:cv_to_splx']}), (e) stable optimization behavior of T-REG.
  • Figure 3: Sensitivity to Dimensional Collapse. The metrics $-\mathcal{W}_2$ and $-\mathcal{L}_\text{E}$ jointly decrease as the collapse level ($\eta$) increases.
  • Figure 4: Impact of the projector architecture.$\mathcal{L}_\text{MSE}$ +$\mathcal{L}_\text{T-REGS}$ top-1 accuracy (%) on the linear evaluation protocol with 100 pretraining epochs.
  • Figure 5: Histograms of embeddings' cosine similarities on CIFAR-10. With T-REGS as a standalone regularization (orange) or as an auxiliary loss (dark orange), the distribution of pairwise cosine similarities becomes concentrated around zero, indicating that the embeddings are highly decorrelated and approach a regular simplex configuration (\ref{['prop:cv_to_splx']}).
  • ...and 3 more figures

Theorems & Definitions (11)

  • Definition 3.1
  • Theorem 4.1
  • Proposition 4.2: Eq. (14.25) in apostol2017new
  • Lemma 4.3
  • proof : Proof of \ref{['prop:cv_to_splx']}
  • Theorem 4.4: costaDeterminingIntrinsicDimension2006
  • Proposition 4.5
  • proof
  • Corollary 4.6
  • proof : Proof of \ref{['prop:cv_to_splx']} (case $n<d+1$)
  • ...and 1 more