Table of Contents
Fetching ...

Weak-SIGReg: Covariance Regularization for Stable Deep Learning

Habibullah Akbar

TL;DR

This work adopts Sketched Isotropic Gaussian Regularization (SIGReg), recently introduced in the LeJEPA self-supervised framework, and repurposes it as a general optimization stabilizer for supervised learning.

Abstract

Modern neural network optimization relies heavily on architectural priorssuch as Batch Normalization and Residual connectionsto stabilize training dynamics. Without these, or in low-data regimes with aggressive augmentation, low-bias architectures like Vision Transformers (ViTs) often suffer from optimization collapse. This work adopts Sketched Isotropic Gaussian Regularization (SIGReg), recently introduced in the LeJEPA self-supervised framework, and repurposes it as a general optimization stabilizer for supervised learning. While the original formulation targets the full characteristic function, a computationally efficient variant is derived, Weak-SIGReg, which targets the covariance matrix via random sketching. Inspired by interacting particle systems, representation collapse is viewed as stochastic drift; SIGReg constrains the representation density towards an isotropic Gaussian, mitigating this drift. Empirically, SIGReg recovers the training of a ViT on CIFAR-100 from a collapsed 20.73\% to 72.02\% accuracy without architectural hacks and significantly improves the convergence of deep vanilla MLPs trained with pure SGD. Code is available at \href{https://github.com/kreasof-ai/sigreg}{github.com/kreasof-ai/sigreg}.

Weak-SIGReg: Covariance Regularization for Stable Deep Learning

TL;DR

This work adopts Sketched Isotropic Gaussian Regularization (SIGReg), recently introduced in the LeJEPA self-supervised framework, and repurposes it as a general optimization stabilizer for supervised learning.

Abstract

Modern neural network optimization relies heavily on architectural priorssuch as Batch Normalization and Residual connectionsto stabilize training dynamics. Without these, or in low-data regimes with aggressive augmentation, low-bias architectures like Vision Transformers (ViTs) often suffer from optimization collapse. This work adopts Sketched Isotropic Gaussian Regularization (SIGReg), recently introduced in the LeJEPA self-supervised framework, and repurposes it as a general optimization stabilizer for supervised learning. While the original formulation targets the full characteristic function, a computationally efficient variant is derived, Weak-SIGReg, which targets the covariance matrix via random sketching. Inspired by interacting particle systems, representation collapse is viewed as stochastic drift; SIGReg constrains the representation density towards an isotropic Gaussian, mitigating this drift. Empirically, SIGReg recovers the training of a ViT on CIFAR-100 from a collapsed 20.73\% to 72.02\% accuracy without architectural hacks and significantly improves the convergence of deep vanilla MLPs trained with pure SGD. Code is available at \href{https://github.com/kreasof-ai/sigreg}{github.com/kreasof-ai/sigreg}.
Paper Structure (16 sections, 3 figures, 7 tables)

This paper contains 16 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Visualizing Optimization Dynamics. (Left) Stochastic Collapse: In standard SGD without normalization, representations (blue dots) tend to collapse into low-dimensional manifolds (red arrows). (Center) Strong-SIGReg: Forces the distribution towards a perfect isotropic sphere. (Right) Weak-SIGReg (Ours): Targets only the covariance, preventing collapse (green arrows) while allowing more geometric flexibility (star shape), sufficient for supervised stability.
  • Figure 2: Weak SIGReg Architecture. A high-dimensional batch $Z$ ($N \times C$) is projected via a random sketch matrix $S$ ($C \times K$) into a lower-dimensional embedding. The covariance of this sketch is computed and forced towards the Identity matrix $I$. This avoids computing the generic $C \times C$ covariance, reducing memory cost to $O(CK)$.
  • Figure 3: Weak SIGReg Implementation