Table of Contents
Fetching ...

Escaping Local Minima Provably in Non-convex Matrix Sensing: A Deterministic Framework via Simulated Lifting

Tianqi Shen, Jinji Yang, Junze He, Kunhan Gao, Ziye Ma

TL;DR

The paper tackles nonconvex low-rank matrix sensing, where spurious local minima hinder gradient methods. It introduces Simulated Oracle Direction (SOD) Escape, a deterministic framework that mimics tensor lifting without actual lifting to steer iterates toward the global optimum. Two complementary results are developed: a single-step escape with an Escape Feasibility Score (EFS) that certifies descent under a Gaussian sensing model, and a general multi-step scheme that simulates truncated projected gradient descent within a structured tensor subspace to guarantee objective decrease and a valid matrix-space escape. Numerical experiments on perturbed matrix completion and real-world MS problems demonstrate reliable escape from local minima and convergence to ground-truth solutions with favorable computational overhead compared to full tensor lifting. The approach offers a principled pathway to leverage over-parameterization insights in a computationally efficient, deterministic manner for nonconvex optimization beyond matrix sensing.

Abstract

Low-rank matrix sensing is a fundamental yet challenging nonconvex problem whose optimization landscape typically contains numerous spurious local minima, making it difficult for gradient-based optimizers to converge to the global optimum. Recent work has shown that over-parameterization via tensor lifting can convert such local minima into strict saddle points, an insight that also partially explains why massive scaling can improve generalization and performance in modern machine learning. Motivated by this observation, we propose a Simulated Oracle Direction (SOD) escape mechanism that simulates the landscape and escape direction of the over-parametrized space, without resorting to actually lifting the problem, since that would be computationally intractable. In essence, we designed a mathematical framework to project over-parametrized escape directions onto the original parameter space to guarantee a strict decrease of objective value from existing local minima. To the best of the our knowledge, this represents the first deterministic framework that could escape spurious local minima with guarantee, especially without using random perturbations or heuristic estimates. Numerical experiments demonstrate that our framework reliably escapes local minima and facilitates convergence to global optima, while incurring minimal computational cost when compared to explicit tensor over-parameterization. We believe this framework has non-trivial implications for nonconvex optimization beyond matrix sensing, by showcasing how simulated over-parameterization can be leveraged to tame challenging optimization landscapes.

Escaping Local Minima Provably in Non-convex Matrix Sensing: A Deterministic Framework via Simulated Lifting

TL;DR

The paper tackles nonconvex low-rank matrix sensing, where spurious local minima hinder gradient methods. It introduces Simulated Oracle Direction (SOD) Escape, a deterministic framework that mimics tensor lifting without actual lifting to steer iterates toward the global optimum. Two complementary results are developed: a single-step escape with an Escape Feasibility Score (EFS) that certifies descent under a Gaussian sensing model, and a general multi-step scheme that simulates truncated projected gradient descent within a structured tensor subspace to guarantee objective decrease and a valid matrix-space escape. Numerical experiments on perturbed matrix completion and real-world MS problems demonstrate reliable escape from local minima and convergence to ground-truth solutions with favorable computational overhead compared to full tensor lifting. The approach offers a principled pathway to leverage over-parameterization insights in a computationally efficient, deterministic manner for nonconvex optimization beyond matrix sensing.

Abstract

Low-rank matrix sensing is a fundamental yet challenging nonconvex problem whose optimization landscape typically contains numerous spurious local minima, making it difficult for gradient-based optimizers to converge to the global optimum. Recent work has shown that over-parameterization via tensor lifting can convert such local minima into strict saddle points, an insight that also partially explains why massive scaling can improve generalization and performance in modern machine learning. Motivated by this observation, we propose a Simulated Oracle Direction (SOD) escape mechanism that simulates the landscape and escape direction of the over-parametrized space, without resorting to actually lifting the problem, since that would be computationally intractable. In essence, we designed a mathematical framework to project over-parametrized escape directions onto the original parameter space to guarantee a strict decrease of objective value from existing local minima. To the best of the our knowledge, this represents the first deterministic framework that could escape spurious local minima with guarantee, especially without using random perturbations or heuristic estimates. Numerical experiments demonstrate that our framework reliably escapes local minima and facilitates convergence to global optima, while incurring minimal computational cost when compared to explicit tensor over-parameterization. We believe this framework has non-trivial implications for nonconvex optimization beyond matrix sensing, by showcasing how simulated over-parameterization can be leveraged to tame challenging optimization landscapes.
Paper Structure (58 sections, 1 theorem, 154 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 58 sections, 1 theorem, 154 equations, 10 figures, 5 tables, 1 algorithm.

Key Result

lemma 1

If $\mathbf{w} \in \mathbb{R}^{nr \circ l}$, then the gradient of $h^l(\mathbf{w})$ is given by: where $\mathbf{A}_r$ is defined as $I_r \oslash_{2,3} \mathbf{A} \in \mathbb{R}^{m \times nr \times nr}$, and $\oslash_{2,3}$ denotes the Kronecker product applied only to the last two dimensions of $\mathbf{A}$. The operator $\tilde{\mathcal{M}}$ is defined in Equation equation:auxiliary-tensor-fun

Figures (10)

  • Figure 1: Escape behavior of the proposed SOD method from a spurious solution $\hat{x}$. Left: distance to the $M^\star$ versus iterations. Right: loss landscape (log-scaled contour map) and trajectories. SOD jumps from $\hat{x}$ to $\check{x}$ (green) and subsequent GD converges to $M^\star$, whereas GD alone is trapped (blue) and SGD does not escape the basin within the iteration budget.
  • Figure 2: Comparison of optimization landscapes in matrix and tensor spaces. Left: in matrix space, GD (purple) becomes trapped at a local minimum (dark blue), away from the global minimum (red pentagram). Right: in tensor space, the corresponding point becomes a saddle (light blue), and GD (green) traverses it and converges to the global minimum. This schematic is illustrative; real landscapes are far more complex.
  • Figure 3: Visualization of escape regions (concentric circles) with EFS zones. The radius of the inner circle is $r_1$, defined in Equation \ref{['definition:r_1']}, and the radius of the outer circle is $r_2$, defined in Equation \ref{['definition:r_2']}. The greener the region, the higher the probability $\mathbb{P}(\mathrm{EFS} > 1)$.
  • Figure 4: Simulation of the TPGD trajectory in a basic example. (a) Evolution of the lifted objective value $h^l(\tilde{\mathbf{w}}^{(t)})$ as a function of the iteration count $t$. The objective decreases in the early stage, while for sufficiently large $t$, truncation errors accumulate and eventually lead to an increase in the objective value. (b) Norms of the three components in the decomposition of $\tilde{\mathbf{w}}^{(t)}$. As $t$ increases, the $\mathbf{b}$-term exhibits clear norm separation from the $\mathbf{a}$- and $\mathbf{c}$-terms, indicating the emergence of a $\beta$-type escape direction.
  • Figure 5: Visualization of quadratic inequality \ref{['equation:quadratic_inequality']}.
  • ...and 5 more figures

Theorems & Definitions (17)

  • proof
  • remark 1
  • proof
  • proof
  • proof
  • remark 2
  • proof
  • proof
  • remark 3
  • remark 4
  • ...and 7 more