Projected Gradient Descent Algorithm for Low-Rank Matrix Estimation

Teng Zhang; Xing Fan

Projected Gradient Descent Algorithm for Low-Rank Matrix Estimation

Teng Zhang, Xing Fan

TL;DR

The effectiveness of the projected gradient descent algorithm is demonstrated, and it is established that there are no spurious local minimizers in estimating asymmetric low-rank matrices when the objective function satisfies L/\mu<3.$

Abstract

Most existing methodologies of estimating low-rank matrices rely on Burer-Monteiro factorization, but these approaches can suffer from slow convergence, especially when dealing with solutions characterized by a large condition number, defined by the ratio of the largest to the $r$-th singular values, where $r$ is the search rank. While methods such as Scaled Gradient Descent have been proposed to address this issue, such methods are more complicated and sometimes have weaker theoretical guarantees, for example, in the rank-deficient setting. In contrast, this paper demonstrates the effectiveness of the projected gradient descent algorithm. Firstly, its local convergence rate is independent of the condition number. Secondly, under conditions where the objective function is rank-$2r$ restricted $L$-smooth and $μ$-strongly convex, with $L/μ< 3$, projected gradient descent with appropriate step size converges linearly to the solution. Moreover, a perturbed version of this algorithm effectively navigates away from saddle points, converging to an approximate solution or a second-order local minimizer across a wide range of step sizes. Furthermore, we establish that there are no spurious local minimizers in estimating asymmetric low-rank matrices when the objective function satisfies $L/μ<3.$

Projected Gradient Descent Algorithm for Low-Rank Matrix Estimation

TL;DR

Abstract

-th singular values, where

is the search rank. While methods such as Scaled Gradient Descent have been proposed to address this issue, such methods are more complicated and sometimes have weaker theoretical guarantees, for example, in the rank-deficient setting. In contrast, this paper demonstrates the effectiveness of the projected gradient descent algorithm. Firstly, its local convergence rate is independent of the condition number. Secondly, under conditions where the objective function is rank-

restricted

-smooth and

-strongly convex, with

, projected gradient descent with appropriate step size converges linearly to the solution. Moreover, a perturbed version of this algorithm effectively navigates away from saddle points, converging to an approximate solution or a second-order local minimizer across a wide range of step sizes. Furthermore, we establish that there are no spurious local minimizers in estimating asymmetric low-rank matrices when the objective function satisfies

Paper Structure (20 sections, 15 theorems, 138 equations, 3 figures, 2 algorithms)

This paper contains 20 sections, 15 theorems, 138 equations, 3 figures, 2 algorithms.

Introduction
Main Results
Related literature
Organization
Background
Notation
Projected and factored gradient descent algorithms
Main Results
Local convergence of ProjGD
Global convergence of ProjGD
Global convergence of perturbed projected gradient descent (PprojGD)
Numerical Experiments
Conclusion
Appendix
Sketch of Proof of Theorem \ref{['thm:local']}
...and 5 more sections

Key Result

Theorem 1

[Local convergence rate] Under Assumptions A1-A2, there exists $c_0>0$ such that for any initialization $\mathbf{X}^{(0)}$ satisfying $f(\mathbf{X}^{(0)})-f(\mathbf{X}_*)\leq 0.01\sigma_{r_*}(\mathbf{X}_*)^2\mu/\kappa_f$, where $\kappa_f=L/\mu$, then ProjGD with a step size $\eta<1/2L$ converges lin

Figures (3)

Figure 1: Comparison of ProjGD, FGD, and ScaledGD algorithms for the estimation of asymmetric matrices. Identical step sizes ($\eta = 0.4$ in the first row and $\eta = 0.6$ in the second row) were employed for all three algorithms, with matrix dimensions set to $n = 10$ and ranks of $r_* = 4$ or $r_* = 2$. Notably, only ProjGD exhibits consistent linear convergence towards the solution.
Figure 2: Comparison of ProjGD, FGD, ScaledGD, and PrecGD algorithms for the estimation of positive semidefinite matrices. Identical step sizes ($\eta = 0.4$ in the first row and $\eta = 0.6$ in the second row) were employed for all three algorithms, with matrix dimensions set to $n = 10$ and ranks of $r_* = 4$ or $r_* = 2$.
Figure 3: The relative errors of ProjGD, ScaledGD and FGD after 80 iterations with respect to different step sizes $\eta$ from $0.1$ to $1.2$. under different condition numbers $\kappa = 1, 20$ for matrix sensing with $n = 10$, $r=r_*=4$, and $m=10nr$.

Theorems & Definitions (15)

Theorem 1
Theorem 2
Theorem 3: Approximate second-order optimality of PprojGD
Corollary 1
Lemma 1: Decrease in functional value
Lemma 2: Lower bound of $\|\mathbf{X}-\mathbf{X}^+\|_F$
Lemma 3: Local approximation by tangent space
Lemma 4
Lemma 5: Bound on derivative
Lemma 6: Change over iterations
...and 5 more

Projected Gradient Descent Algorithm for Low-Rank Matrix Estimation

TL;DR

Abstract

Projected Gradient Descent Algorithm for Low-Rank Matrix Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (15)