Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?

Jianhao Ma; Salar Fattahi

Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?

Jianhao Ma, Salar Fattahi

TL;DR

This work analyzes robust low-rank matrix recovery under nonsmooth $\ell_1$-loss, showing that ground-truth matrices $X^\star$ typically do not appear as local optima but as strict saddles in many regimes. By formulating a Burer–Monteiro factorization and leveraging parametric perturbation constructions, the authors establish precise sample-size thresholds separating regimes where true solutions are non-optimal, strict saddles, or global minima across symmetric/asymmetric sensing and completion. Key contributions include tight landscape characterizations under Gaussian sensing and elementwise completion, and proofs of matching lower bounds, demonstrating the nontrivial role of rank, coherence, and noise in shaping the optimization geometry. The findings challenge the belief that saddle points are universally detrimental and highlight nuanced implications for learning dynamics, including potential explanations for why simple subgradient methods can converge to true solutions in the presence of outliers, as well as directions for future work on saddle-escape and early stopping strategies.

Abstract

We explore the local landscape of low-rank matrix recovery, focusing on reconstructing a $d_1\times d_2$ matrix $X^\star$ with rank $r$ from $m$ linear measurements, some potentially noisy. When the noise is distributed according to an outlier model, minimizing a nonsmooth $\ell_1$-loss with a simple sub-gradient method can often perfectly recover the ground truth matrix $X^\star$. Given this, a natural question is what optimization property (if any) enables such learning behavior. The most plausible answer is that the ground truth $X^\star$ manifests as a local optimum of the loss function. In this paper, we provide a strong negative answer to this question, showing that, under moderate assumptions, the true solutions corresponding to $X^\star$ do not emerge as local optima, but rather as strict saddle points -- critical points with strictly negative curvature in at least one direction. Our findings challenge the conventional belief that all strict saddle points are undesirable and should be avoided.

Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?

TL;DR

This work analyzes robust low-rank matrix recovery under nonsmooth

-loss, showing that ground-truth matrices

typically do not appear as local optima but as strict saddles in many regimes. By formulating a Burer–Monteiro factorization and leveraging parametric perturbation constructions, the authors establish precise sample-size thresholds separating regimes where true solutions are non-optimal, strict saddles, or global minima across symmetric/asymmetric sensing and completion. Key contributions include tight landscape characterizations under Gaussian sensing and elementwise completion, and proofs of matching lower bounds, demonstrating the nontrivial role of rank, coherence, and noise in shaping the optimization geometry. The findings challenge the belief that saddle points are universally detrimental and highlight nuanced implications for learning dynamics, including potential explanations for why simple subgradient methods can converge to true solutions in the presence of outliers, as well as directions for future work on saddle-escape and early stopping strategies.

Abstract

We explore the local landscape of low-rank matrix recovery, focusing on reconstructing a

matrix

with rank

from

linear measurements, some potentially noisy. When the noise is distributed according to an outlier model, minimizing a nonsmooth

-loss with a simple sub-gradient method can often perfectly recover the ground truth matrix

. Given this, a natural question is what optimization property (if any) enables such learning behavior. The most plausible answer is that the ground truth

manifests as a local optimum of the loss function. In this paper, we provide a strong negative answer to this question, showing that, under moderate assumptions, the true solutions corresponding to

do not emerge as local optima, but rather as strict saddle points -- critical points with strictly negative curvature in at least one direction. Our findings challenge the conventional belief that all strict saddle points are undesirable and should be avoided.

Paper Structure (42 sections, 41 theorems, 114 equations, 1 figure, 1 table)

This paper contains 42 sections, 41 theorems, 114 equations, 1 figure, 1 table.

Introduction
Robust Low-rank Matrix Recovery
Summary of Contributions
Symmetric matrix sensing.
Asymmetric matrix sensing.
Symmetric matrix completion.
Asymmetric matrix completion.
Proof Idea: Parametric Perturbation Sets
Related Work
Effect of over-parameterization
Effect of noise
Convergence guarantees
Notations.
Main Results
Problem Formulation
...and 27 more sections

Key Result

Theorem 1

Consider BM-sensing-sym with measurement matrices satisfying asp_sensing. Suppose that the noise satisfies assumption::general-noise. Suppose that $k> r$ and $m\lesssim {pp_0d(k-r)}$. With probability at least $1-\exp(-\Omega(d(k-r)))-\exp(-\Omega(pp_0m))$, none of the true solutions in $\mathcal{W}

Figures (1)

Figure 1: We apply the sub-gradient method with an exponentially decaying stepsize to noisy instances of symmetric and asymmetric matrix sensing with $\ell_1$-loss. When the algorithm is initialized randomly but close enough to the true solution, a significant portion of the trajectories converge to the true solution.

Theorems & Definitions (49)

Theorem 1: Sub-optimality of true solutions for symmetric matrix sensing
Theorem 2: Optimality of true solutions for symmetric matrix sensing
Corollary 1
Theorem 3: Sub-optimality of true solutions for asymmetric matrix sensing
Theorem 4: Optimality of true solutions for asymmetric matrix sensing
Corollary 2
Theorem 5: Sub-optimality of true solutions for symmetric matrix completion
Definition 1
Theorem 6: Sub-optimality of true solutions for asymmetric matrix completion
Lemma 1
...and 39 more

Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?

TL;DR

Abstract

Can Learning Be Explained By Local Optimality In Robust Low-rank Matrix Recovery?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (49)