Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora, Nadav Cohen, Wei Hu, Yuping Luo
TL;DR
The paper analyzes implicit regularization in deep linear networks for matrix completion and sensing (deep matrix factorization) under gradient descent. It shows that greater depth strengthens the bias toward low-rank solutions and can outperform nuclear-norm minimization in data-sparse regimes, while challenging the sufficiency of Schatten-$p$ or nuclear norms as universal descriptors. Through end-to-end gradient-flow analysis, it derives depth-dependent singular-value dynamics, revealing that depth accelerates growth of large singular values and suppresses small ones, effectively imposing a deeper low-rank bias. The findings emphasize the importance of optimization trajectories and dynamics for generalization in deep linear (and potentially non-linear) networks, motivating further study beyond traditional norm-based regularizers.
Abstract
Efforts to understand the generalization mystery in deep learning have led to the belief that gradient-based optimization induces a form of implicit regularization, a bias towards models of low "complexity." We study the implicit regularization of gradient descent over deep linear neural networks for matrix completion and sensing, a model referred to as deep matrix factorization. Our first finding, supported by theory and experiments, is that adding depth to a matrix factorization enhances an implicit tendency towards low-rank solutions, oftentimes leading to more accurate recovery. Secondly, we present theoretical and empirical arguments questioning a nascent view by which implicit regularization in matrix factorization can be captured using simple mathematical norms. Our results point to the possibility that the language of standard regularizers may not be rich enough to fully encompass the implicit regularization brought forth by gradient-based optimization.
