FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning

Oscar Skean; Aayush Dhakal; Nathan Jacobs; Luis Gonzalo Sanchez Giraldo

FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning

Oscar Skean, Aayush Dhakal, Nathan Jacobs, Luis Gonzalo Sanchez Giraldo

TL;DR

FroSSL addresses efficiency bottlenecks in multiview self-supervised learning by introducing a decomposition-free objective that regularizes covariance eigenvalues and enforces augmentation invariance via a log covariance Frobenius term and a mean-squared error term. It unifies dimension- and sample-contrastive signals (up to normalization) and avoids costly eigendecomposition, enabling linear scaling in the number of views and faster convergence. The approach is supported by theoretical insights into eigenvalue dynamics and extensive empirical validation across CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-100, showing competitive linear-probe performance and robustness to augmentations, especially with multiple views. Overall, FroSSL demonstrates substantial improvements in training efficiency and practical performance, offering a principled, eigendecomposition-free path to faster, more scalable SSL.

Abstract

Self-supervised learning (SSL) is a popular paradigm for representation learning. Recent multiview methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While these families converge to solutions of similar quality, it can be empirically shown that some methods are epoch-inefficient and require longer training to reach a target performance. Two main approaches to improving efficiency are covariance eigenvalue regularization and using more views. However, these two approaches are difficult to combine due to the computational complexity of computing eigenvalues. We present the objective function FroSSL which reconciles both approaches while avoiding eigendecomposition entirely. FroSSL works by minimizing covariance Frobenius norms to avoid collapse and minimizing mean-squared error to achieve augmentation invariance. We show that FroSSL reaches competitive accuracies more quickly than any other SSL method and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet-18 on several datasets, including STL-10, Tiny ImageNet, and ImageNet-100.

FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning

TL;DR

Abstract

Paper Structure (73 sections, 3 theorems, 15 equations, 2 figures, 8 tables)

This paper contains 73 sections, 3 theorems, 15 equations, 2 figures, 8 tables.

Introduction
Background and Notation
The Joint Embedding Self-Supervised Learning Problem
The Three Families of Joint Embedding SSL Objectives
A Framework for Dimension-Contrastive Methods
Multiview Invariance Term
The FroSSL Objective
The Role of the Logarithm
FroSSL is both Sample-contrastive and Dimension-contrastive
On Efficiency in Self-Supervised Learning
Reducing the Number of Epochs with Eigenvalue Dynamics
Reducing the Number of Epochs by Using More Views
Multiview with 3 or More Views
Multi-Patch and Multi-Crop Methods
Exploring Time, Space, and Epoch Tradeoffs
...and 58 more sections

Key Result

proposition thmcounterproposition

If every embedding dimension is normalized to have equal variance, then FroSSL is a dimension-contrastive method. Proof in Appendix appendix:dimension-contrastive-proof.

Figures (2)

Figure 1: The SSL pipeline used in this work. In general, the encoder and projector may be asymmetric. We use symmetric encoders with shared weights and the same augmentation set for each view. We refer to $X_1$ as view 1 of $X$, and $X_2$ as view 2. Only two views are shown here, though more may be used in practice.
Figure 2: The choice of variance term, $D_{\textrm{var}}(\Sigma_{v}\Vert \mathbf{I})$, has a significant impact on training dynamics. Each subplot visualizes the trajectories of the top 20 eigenvalues of the embedding covariance matrix $\Sigma_{1}$ when trained with dimension-contrastive methods. These trajectories show how quickly $\Sigma_v$ converges to $\gamma I$, which has eigenvalues equal to $\frac{\gamma}{D}$. VICReg, Barlow Twins, and CorInfoMax converge slowly. FroSSL and I-VNE have similar training dynamics, but FroSSL has significantly lower computational complexity because it avoids explicitly computing the eigendecomposition.

Theorems & Definitions (5)

definition thmcounterdefinition: Dimension-Contrastive Method
definition thmcounterdefinition: Sample-Contrastive Method
proposition thmcounterproposition
proposition thmcounterproposition
proposition thmcounterproposition

FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning

TL;DR

Abstract

FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (5)