Implicit Regularization in Matrix Factorization
Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
TL;DR
The paper investigates implicit regularization in underdetermined matrix regression by optimizing a full-dimensional factorization $X=UU^T$ via gradient descent on $U$. It derives gradient-flow dynamics and conjectures that, under small steps and near-origin initialization, the limit solution attains the minimum nuclear-norm subject to $A(X)=y$, effectively biasing toward the simplest enriched representation. Theoretical results establish the conjecture in the commuting case, while non-commuting measurement matrices pose substantial analytical challenges, complemented by extensive empirical evidence across synthetic and real data showing a bias toward low nuclear norm even when reconstruction is not guaranteed. These findings suggest that optimization dynamics themselves can act as a powerful implicit regularizer, with implications for generalization in non-convex matrix factorization and related architectures.
Abstract
We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.
