Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
Yikun Hou, Suvrit Sra, Alp Yurtsever
TL;DR
This work investigates implicit bias in gradient-based matrix factorization and introduces a stable, explicit realization via X = U D U^T with U restricted to a Frobenius-norm ball and D diagonal and nonnegative. The proposed UDU formulation yields truly low-rank solutions across matrix completion and Fourier ptychography, addressing limitations of the classical Burer--Monteiro factorization. The authors extend the idea to neural networks with a constrained diagonal layer (UDV), achieving competitive performance while exhibiting a pronounced low-rank bias and enabling effective SVD-based pruning to produce compact models. A fixed-point analysis relates the new method to BM while clarifying the mechanisms that promote low-rank structure during training, and practical results demonstrate robustness across datasets, optimizers, and even LoRA-based fine-tuning. Overall, the approach offers a principled route to structured, memory-efficient representations with strong implicit regularization effects that are valuable for both theory and practical model compression.
Abstract
Gradient descent for matrix factorization exhibits an implicit bias toward approximately low-rank solutions. While existing theories often assume the boundedness of iterates, empirically the bias persists even with unbounded sequences. This reflects a dynamic where factors develop low-rank structure while their magnitudes increase, tending to align with certain directions. To capture this behavior in a stable way, we introduce a new factorization model: $X\approx UDV^\top$, where $U$ and $V$ are constrained within norm balls, while $D$ is a diagonal factor allowing the model to span the entire search space. Experiments show that this model consistently exhibits a strong implicit bias, yielding truly (rather than approximately) low-rank solutions. Extending the idea to neural networks, we introduce a new model featuring constrained layers and diagonal components that achieves competitive performance on various regression and classification tasks while producing lightweight, low-rank representations.
