Table of Contents
Fetching ...

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

TL;DR

This work studies non-square matrix sensing under Restricted Isometry Property (RIP) and analyzes a non-convex UV factorization X = UV^T. By reformulating with W = [U; V] and adding a balancing regularizer g(W) with λ = 1/4, the authors show that, under RIP, the non-convex parametrization does not introduce spurious local minima; any first- and second-order stationary point must be globally close to the balanced factorization of the true X^*. A key result provides an explicit bound: if W is first- and second-order optimal, then (1 - 5δ_{2r} - 544δ_{4r}^2 - 1088δ_{2r}δ_{4r}^2)/(8(40+68δ_{2r})(1+δ_{2r})) · ||WW^T - W^*W^{*T}||_F^2 ≤ ||A(U^*V^{* op}) - b||^2, implying convergence to the true X^* in noiseless cases when δ's are small. The paper also extends the discussion to noisy and high-rank settings, and establishes a strict saddle property to ensure gradient-based methods escape non-global saddles and reliably approach the global optimum in practical scenarios. Overall, the results support the effectiveness of the Burer-Monteiro approach for non-square matrix sensing by preserving favorable landscape properties under RIP.

Abstract

We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions. We focus on the non-convex formulation, where any rank-$r$ matrix $X \in \mathbb{R}^{m \times n}$ is represented as $UV^\top$, where $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n \times r}$. In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP.

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

TL;DR

This work studies non-square matrix sensing under Restricted Isometry Property (RIP) and analyzes a non-convex UV factorization X = UV^T. By reformulating with W = [U; V] and adding a balancing regularizer g(W) with λ = 1/4, the authors show that, under RIP, the non-convex parametrization does not introduce spurious local minima; any first- and second-order stationary point must be globally close to the balanced factorization of the true X^*. A key result provides an explicit bound: if W is first- and second-order optimal, then (1 - 5δ_{2r} - 544δ_{4r}^2 - 1088δ_{2r}δ_{4r}^2)/(8(40+68δ_{2r})(1+δ_{2r})) · ||WW^T - W^*W^{*T}||_F^2 ≤ ||A(U^*V^{* op}) - b||^2, implying convergence to the true X^* in noiseless cases when δ's are small. The paper also extends the discussion to noisy and high-rank settings, and establishes a strict saddle property to ensure gradient-based methods escape non-global saddles and reliably approach the global optimum in practical scenarios. Overall, the results support the effectiveness of the Burer-Monteiro approach for non-square matrix sensing by preserving favorable landscape properties under RIP.

Abstract

We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions. We focus on the non-convex formulation, where any rank- matrix is represented as , where and . In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP.

Paper Structure

This paper contains 17 sections, 8 theorems, 53 equations.

Key Result

Proposition 1.2

For a linear operator $\mathcal{A} :~\mathbb{R}^{m \times n} \rightarrow \mathbb{R}^p$ that satisfies the restricted isometry property on rank-$r$ matrices, the following inequality holds for any two rank-$r$ matrices $X, ~Y \in \mathbb{R}^{m \times n}$:

Theorems & Definitions (15)

  • Definition 1.1: Restricted Isometry Property (RIP)
  • Proposition 1.2: Useful property due to RIP
  • Proposition 1.3
  • proof
  • Theorem 2.1
  • Corollary 2.2
  • Remark 1: Noiseless matrix sensing
  • Remark 2: Noisy matrix sensing
  • Remark 3: High-rank matrix sensing
  • Lemma 3.1
  • ...and 5 more