Table of Contents
Fetching ...

Scaled Gradient Descent for Ill-Conditioned Low-Rank Matrix Recovery with Optimal Sampling Complexity

Zhenxuan Li, Meng Huang

Abstract

The low-rank matrix recovery problem seeks to reconstruct an unknown $n_1 \times n_2$ rank-$r$ matrix from $m$ linear measurements, where $m\ll n_1n_2$. This problem has been extensively studied over the past few decades, leading to a variety of algorithms with solid theoretical guarantees. Among these, gradient descent based non-convex methods have become particularly popular due to their computational efficiency. However, these methods typically suffer from two key limitations: a sub-optimal sample complexity of $O((n_1 + n_2)r^2)$ and an iteration complexity of $O(κ\log(1/ε))$ to achieve $ε$-accuracy, resulting in slow convergence when the target matrix is ill-conditioned. Here, $κ$ denotes the condition number of the unknown matrix. Recent studies show that a preconditioned variant of GD, known as scaled gradient descent (ScaledGD), can significantly reduce the iteration complexity to $O(\log(1/ε))$. Nonetheless, its sample complexity remains sub-optimal at $O((n_1 + n_2)r^2)$. In contrast, a delicate virtual sequence technique demonstrates that the standard GD in the positive semidefinite (PSD) setting achieves the optimal sample complexity $O((n_1 + n_2)r)$, but converges more slowly with an iteration complexity $O(κ^2 \log(1/ε))$. In this paper, through a more refined analysis, we show that ScaledGD achieves both the optimal sample complexity $O((n_1 + n_2)r)$ and the improved iteration complexity $O(\log(1/ε))$. Notably, our results extend beyond the PSD setting to general low-rank matrix recovery problem. Numerical experiments further validate that ScaledGD accelerates convergence for ill-conditioned matrices with the optimal sampling complexity.

Scaled Gradient Descent for Ill-Conditioned Low-Rank Matrix Recovery with Optimal Sampling Complexity

Abstract

The low-rank matrix recovery problem seeks to reconstruct an unknown rank- matrix from linear measurements, where . This problem has been extensively studied over the past few decades, leading to a variety of algorithms with solid theoretical guarantees. Among these, gradient descent based non-convex methods have become particularly popular due to their computational efficiency. However, these methods typically suffer from two key limitations: a sub-optimal sample complexity of and an iteration complexity of to achieve -accuracy, resulting in slow convergence when the target matrix is ill-conditioned. Here, denotes the condition number of the unknown matrix. Recent studies show that a preconditioned variant of GD, known as scaled gradient descent (ScaledGD), can significantly reduce the iteration complexity to . Nonetheless, its sample complexity remains sub-optimal at . In contrast, a delicate virtual sequence technique demonstrates that the standard GD in the positive semidefinite (PSD) setting achieves the optimal sample complexity , but converges more slowly with an iteration complexity . In this paper, through a more refined analysis, we show that ScaledGD achieves both the optimal sample complexity and the improved iteration complexity . Notably, our results extend beyond the PSD setting to general low-rank matrix recovery problem. Numerical experiments further validate that ScaledGD accelerates convergence for ill-conditioned matrices with the optimal sampling complexity.

Paper Structure

This paper contains 20 sections, 16 theorems, 156 equations, 3 figures, 1 table, 2 algorithms.

Key Result

Theorem 3.1

Let $\boldsymbol{X}_{\star} \in \mathbb{R}^{n_1 \times n_2}$ with $\mathop{\mathrm{rank}}\nolimits(\boldsymbol{X}_{\star})=r$, and let $\boldsymbol{A}_1,\ldots,\boldsymbol{A}_m \in \mathbb{R}^{n_1 \times n_2}$ be Gaussian random matrices with i.i.d. entries distributed as $\mathcal{N} \left( 0,1 \ri and hold for all iterations $t\ge 0$, provided $m \ge C \left(n_1+n_2\right) r \kappa^2$. Here, $\

Figures (3)

  • Figure 1: Relative error with iterations (left); relative error with runtime (right).
  • Figure 2: Time cost of different methods under varying condition numbers.
  • Figure 3: Phase transition diagrams: $m$ vs $r$. Black indicates failure, and white indicates success.

Theorems & Definitions (29)

  • Theorem 3.1
  • Remark 3.2
  • Definition 4.1: RIP
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.4
  • Lemma 4.5
  • proof
  • Lemma 4.6
  • proof
  • ...and 19 more