FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning
Oscar Skean, Aayush Dhakal, Nathan Jacobs, Luis Gonzalo Sanchez Giraldo
TL;DR
FroSSL addresses efficiency bottlenecks in multiview self-supervised learning by introducing a decomposition-free objective that regularizes covariance eigenvalues and enforces augmentation invariance via a log covariance Frobenius term and a mean-squared error term. It unifies dimension- and sample-contrastive signals (up to normalization) and avoids costly eigendecomposition, enabling linear scaling in the number of views and faster convergence. The approach is supported by theoretical insights into eigenvalue dynamics and extensive empirical validation across CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-100, showing competitive linear-probe performance and robustness to augmentations, especially with multiple views. Overall, FroSSL demonstrates substantial improvements in training efficiency and practical performance, offering a principled, eigendecomposition-free path to faster, more scalable SSL.
Abstract
Self-supervised learning (SSL) is a popular paradigm for representation learning. Recent multiview methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While these families converge to solutions of similar quality, it can be empirically shown that some methods are epoch-inefficient and require longer training to reach a target performance. Two main approaches to improving efficiency are covariance eigenvalue regularization and using more views. However, these two approaches are difficult to combine due to the computational complexity of computing eigenvalues. We present the objective function FroSSL which reconciles both approaches while avoiding eigendecomposition entirely. FroSSL works by minimizing covariance Frobenius norms to avoid collapse and minimizing mean-squared error to achieve augmentation invariance. We show that FroSSL reaches competitive accuracies more quickly than any other SSL method and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet-18 on several datasets, including STL-10, Tiny ImageNet, and ImageNet-100.
