WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization
Ali Saheb Pasand, Reza Moravej, Mahdi Biparva, Ali Ghodsi
TL;DR
WERank addresses the problem of rank degradation and dimensional collapse in self-supervised learning by introducing a layer-wise weight regularization term that nudges each layer toward near-orthonormal mappings. The method adds a simple regularizer, $\mathcal{L}_{reg} = \sum_{l=1}^{L} \alpha_l \|W_l^T W_l - I\|_F$, to the SSL objective, promoting rank preservation across the network without relying on costly per-batch whitening. Empirical results on graph SSL (notably with BGRL/BYOL-style setups) show that WERank helps maintain higher rank in both representations and embeddings, yielding modest but consistent downstream gains, especially under weak data augmentation. The work provides theoretical motivation, toy-practice validation, and comprehensive graph-domain experiments, highlighting WERank as a practical, complementary tool for improving SSL representations with broad potential applicability.
Abstract
A common phenomena confining the representation quality in Self-Supervised Learning (SSL) is dimensional collapse (also known as rank degeneration), where the learned representations are mapped to a low dimensional subspace of the representation space. The State-of-the-Art SSL methods have shown to suffer from dimensional collapse and fall behind maintaining full rank. Recent approaches to prevent this problem have proposed using contrastive losses, regularization techniques, or architectural tricks. We propose WERank, a new regularizer on the weight parameters of the network to prevent rank degeneration at different layers of the network. We provide empirical evidence and mathematical justification to demonstrate the effectiveness of the proposed regularization method in preventing dimensional collapse. We verify the impact of WERank on graph SSL where dimensional collapse is more pronounced due to the lack of proper data augmentation. We empirically demonstrate that WERank is effective in helping BYOL to achieve higher rank during SSL pre-training and consequently downstream accuracy during evaluation probing. Ablation studies and experimental analysis shed lights on the underlying factors behind the performance gains of the proposed approach.
