$k$-SVD with Gradient Descent
Yassir Jedra, Devavrat Shah
TL;DR
This work develops a gradient-descent method for computing the leading $k$-SVD of a matrix $M$ of rank $d$, using a simple, parameter-free step-size and random initialization. The authors prove global linear convergence by showing the iterates enter an attracting region where the dynamics emulate Heron’s method for the top singular value, and extend the approach to sequentially recover $\sigma_1,\dots,\sigma_k$ and $u_1,\dots,u_k$, including under-parameterized cases ($k\le d$). They further introduce acceleration via Nesterov’s method to improve rates and demonstrate favorable empirical performance on synthetic and real-data matrices, with runtimes competitive to Lanczos-based approaches. The results offer a scalable, robust alternative for large-scale $k$-SVD and deepen understanding of gradient-based methods for nonconvex matrix factorization, including the role of preconditioning and geometric regions of attraction.
Abstract
The emergence of modern compute infrastructure for iterative optimization has led to great interest in developing optimization-based approaches for a scalable computation of $k$-SVD, i.e., the $k\geq 1$ largest singular values and corresponding vectors of a matrix of rank $d \geq 1$. Despite lots of exciting recent works, all prior works fall short in this pursuit. Specifically, the existing results are either for the exact-parameterized (i.e., $k = d$) and over-parameterized (i.e., $k > d$) settings; or only establish local convergence guarantees; or use a step-size that requires problem-instance-specific oracle-provided information. In this work, we complete this pursuit by providing a gradient-descent method with a simple, universal rule for step-size selection (akin to pre-conditioning), that provably finds $k$-SVD for a matrix of any rank $d \geq 1$. We establish that the gradient method with random initialization enjoys global linear convergence for any $k, d \geq 1$. Our convergence analysis reveals that the gradient method has an attractive region, and within this attractive region, the method behaves like Heron's method (a.k.a. the Babylonian method). Our analytic results about the said attractive region imply that the gradient method can be enhanced by means of Nesterov's momentum-based acceleration technique. The resulting improved convergence rates match those of rather complicated methods typically relying on Lanczos iterations or variants thereof. We provide an empirical study to validate the theoretical results.
