WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization

Ali Saheb Pasand; Reza Moravej; Mahdi Biparva; Ali Ghodsi

WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization

Ali Saheb Pasand, Reza Moravej, Mahdi Biparva, Ali Ghodsi

TL;DR

WERank addresses the problem of rank degradation and dimensional collapse in self-supervised learning by introducing a layer-wise weight regularization term that nudges each layer toward near-orthonormal mappings. The method adds a simple regularizer, $\mathcal{L}_{reg} = \sum_{l=1}^{L} \alpha_l \|W_l^T W_l - I\|_F$, to the SSL objective, promoting rank preservation across the network without relying on costly per-batch whitening. Empirical results on graph SSL (notably with BGRL/BYOL-style setups) show that WERank helps maintain higher rank in both representations and embeddings, yielding modest but consistent downstream gains, especially under weak data augmentation. The work provides theoretical motivation, toy-practice validation, and comprehensive graph-domain experiments, highlighting WERank as a practical, complementary tool for improving SSL representations with broad potential applicability.

Abstract

A common phenomena confining the representation quality in Self-Supervised Learning (SSL) is dimensional collapse (also known as rank degeneration), where the learned representations are mapped to a low dimensional subspace of the representation space. The State-of-the-Art SSL methods have shown to suffer from dimensional collapse and fall behind maintaining full rank. Recent approaches to prevent this problem have proposed using contrastive losses, regularization techniques, or architectural tricks. We propose WERank, a new regularizer on the weight parameters of the network to prevent rank degeneration at different layers of the network. We provide empirical evidence and mathematical justification to demonstrate the effectiveness of the proposed regularization method in preventing dimensional collapse. We verify the impact of WERank on graph SSL where dimensional collapse is more pronounced due to the lack of proper data augmentation. We empirically demonstrate that WERank is effective in helping BYOL to achieve higher rank during SSL pre-training and consequently downstream accuracy during evaluation probing. Ablation studies and experimental analysis shed lights on the underlying factors behind the performance gains of the proposed approach.

WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization

TL;DR

, to the SSL objective, promoting rank preservation across the network without relying on costly per-batch whitening. Empirical results on graph SSL (notably with BGRL/BYOL-style setups) show that WERank helps maintain higher rank in both representations and embeddings, yielding modest but consistent downstream gains, especially under weak data augmentation. The work provides theoretical motivation, toy-practice validation, and comprehensive graph-domain experiments, highlighting WERank as a practical, complementary tool for improving SSL representations with broad potential applicability.

Abstract

Paper Structure (25 sections, 15 equations, 8 figures, 11 tables)

This paper contains 25 sections, 15 equations, 8 figures, 11 tables.

Introduction
Related work
WERank Regularization
Notation
Rank Degradation Prevention by Weight Regularization
Empirical Study on the Role of Weight Regularization
Experimental Evaluation
Experimental Setup
Experimental Results
Ablations Studies and Empirical Analysis
Conclusion
WERank Implementation
The Impact of WERank on the Learned Representations
WERank Encourages Orthonormality
WERank Encourages Short Mappings
...and 10 more sections

Figures (8)

Figure 1: The singular values of the weight matrices and the embedding space covariance matrix during training (top) VICReg with no regularization (button) VICReg with the WERank regularizer. The augmentation magnitude ($k$) is set to $0.1$.
Figure 2: Weight matrix singular value spectrum with different augmentation amplitudes $k$, measured at the end of training. Solid lines depict the model with no regularizer and dotted lines depict model + WERank. (Left) EMA model (right) InfoNCE model (middle) VICReg model.
Figure 3: The rank of the representation and embedding spaces on PPI and ogbn-arXiv. Curves over-smoothed for clarity.
Figure 4: Percent improvement of BGRL + WERank under different coefficients over BGRL + WERank with coefficient 1. The same coefficient $\alpha$ is applied to every layer in the encoder.
Figure 5: The singular values of the weight matrices and the embedding space covariance matrix during training (top) InfoNCE model with no regularization (button) InfoNCE model with the WERank regularizer. The augmentation magnitude ($k$) is set to $0.1$
...and 3 more figures

WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization

TL;DR

Abstract

WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (8)