Table of Contents
Fetching ...

FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning

Oscar Skean, Aayush Dhakal, Nathan Jacobs, Luis Gonzalo Sanchez Giraldo

TL;DR

FroSSL addresses efficiency bottlenecks in multiview self-supervised learning by introducing a decomposition-free objective that regularizes covariance eigenvalues and enforces augmentation invariance via a log covariance Frobenius term and a mean-squared error term. It unifies dimension- and sample-contrastive signals (up to normalization) and avoids costly eigendecomposition, enabling linear scaling in the number of views and faster convergence. The approach is supported by theoretical insights into eigenvalue dynamics and extensive empirical validation across CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-100, showing competitive linear-probe performance and robustness to augmentations, especially with multiple views. Overall, FroSSL demonstrates substantial improvements in training efficiency and practical performance, offering a principled, eigendecomposition-free path to faster, more scalable SSL.

Abstract

Self-supervised learning (SSL) is a popular paradigm for representation learning. Recent multiview methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While these families converge to solutions of similar quality, it can be empirically shown that some methods are epoch-inefficient and require longer training to reach a target performance. Two main approaches to improving efficiency are covariance eigenvalue regularization and using more views. However, these two approaches are difficult to combine due to the computational complexity of computing eigenvalues. We present the objective function FroSSL which reconciles both approaches while avoiding eigendecomposition entirely. FroSSL works by minimizing covariance Frobenius norms to avoid collapse and minimizing mean-squared error to achieve augmentation invariance. We show that FroSSL reaches competitive accuracies more quickly than any other SSL method and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet-18 on several datasets, including STL-10, Tiny ImageNet, and ImageNet-100.

FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning

TL;DR

FroSSL addresses efficiency bottlenecks in multiview self-supervised learning by introducing a decomposition-free objective that regularizes covariance eigenvalues and enforces augmentation invariance via a log covariance Frobenius term and a mean-squared error term. It unifies dimension- and sample-contrastive signals (up to normalization) and avoids costly eigendecomposition, enabling linear scaling in the number of views and faster convergence. The approach is supported by theoretical insights into eigenvalue dynamics and extensive empirical validation across CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-100, showing competitive linear-probe performance and robustness to augmentations, especially with multiple views. Overall, FroSSL demonstrates substantial improvements in training efficiency and practical performance, offering a principled, eigendecomposition-free path to faster, more scalable SSL.

Abstract

Self-supervised learning (SSL) is a popular paradigm for representation learning. Recent multiview methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While these families converge to solutions of similar quality, it can be empirically shown that some methods are epoch-inefficient and require longer training to reach a target performance. Two main approaches to improving efficiency are covariance eigenvalue regularization and using more views. However, these two approaches are difficult to combine due to the computational complexity of computing eigenvalues. We present the objective function FroSSL which reconciles both approaches while avoiding eigendecomposition entirely. FroSSL works by minimizing covariance Frobenius norms to avoid collapse and minimizing mean-squared error to achieve augmentation invariance. We show that FroSSL reaches competitive accuracies more quickly than any other SSL method and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet-18 on several datasets, including STL-10, Tiny ImageNet, and ImageNet-100.
Paper Structure (73 sections, 3 theorems, 15 equations, 2 figures, 8 tables)

This paper contains 73 sections, 3 theorems, 15 equations, 2 figures, 8 tables.

Key Result

proposition thmcounterproposition

If every embedding dimension is normalized to have equal variance, then FroSSL is a dimension-contrastive method. Proof in Appendix appendix:dimension-contrastive-proof.

Figures (2)

  • Figure 1: The SSL pipeline used in this work. In general, the encoder and projector may be asymmetric. We use symmetric encoders with shared weights and the same augmentation set for each view. We refer to $X_1$ as view 1 of $X$, and $X_2$ as view 2. Only two views are shown here, though more may be used in practice.
  • Figure 2: The choice of variance term, $D_{\textrm{var}}(\Sigma_{v}\Vert \mathbf{I})$, has a significant impact on training dynamics. Each subplot visualizes the trajectories of the top 20 eigenvalues of the embedding covariance matrix $\Sigma_{1}$ when trained with dimension-contrastive methods. These trajectories show how quickly $\Sigma_v$ converges to $\gamma I$, which has eigenvalues equal to $\frac{\gamma}{D}$. VICReg, Barlow Twins, and CorInfoMax converge slowly. FroSSL and I-VNE have similar training dynamics, but FroSSL has significantly lower computational complexity because it avoids explicitly computing the eigendecomposition.

Theorems & Definitions (5)

  • definition thmcounterdefinition: Dimension-Contrastive Method
  • definition thmcounterdefinition: Sample-Contrastive Method
  • proposition thmcounterproposition
  • proposition thmcounterproposition
  • proposition thmcounterproposition