Global Optimality of Local Search for Low Rank Matrix Recovery
Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro
TL;DR
We address recovering a low-rank PSD matrix $\bm{X^*}$ from linear measurements via the non-convex factorization $\bm{X}=\bm{U}\bm{U}^T$ and objective $f(\bm{U})=\|\mathcal{A}(\bm{U}\bm{U}^T)-\bm{y}\|^2$. Under $(2r,\delta_{2r})$-RIP with $\delta_{2r}<1/5$, the landscape has no spurious local minima in the noiseless case, and in the presence of noise all local minima are close to the global optimum; saddle points have negative curvature, enabling SGD from random initialization to converge to a global optimum in polynomial time. The results extend to approximate low rank with explicit error bounds and show near-optimal sample complexity for Gaussian measurements, while also establishing the necessity of RIP by presenting counterexamples when RIP fails. Collectively, the work provides a theoretical justification for practical non-convex matrix factorization methods, bridging gaps between theory and practice and offering insight into optimization landscapes that may generalize to broader rank-constrained problems and deep networks.
Abstract
We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to a global optimum. Together with a curvature bound at saddle points, this yields a polynomial time global convergence guarantee for stochastic gradient descent {\em from random initialization}.
