Improved Global Guarantees for the Nonconvex Burer--Monteiro Factorization via Rank Overparameterization

Richard Y. Zhang

Improved Global Guarantees for the Nonconvex Burer--Monteiro Factorization via Rank Overparameterization

Richard Y. Zhang

TL;DR

This work analyzes the nonconvex Burer–Monteiro factorization for semidefinite-program-like objectives by studying $f(X)=φ(XX^{T})$ with $φ$ $L$-smooth and $μ$-strongly convex. It proves that a constant-factor overparameterization, specifically $r>rac{1}{4}(L/μ-1)^{2}r^{igstar}$, eliminates spurious local minima, enabling global convergence from arbitrary initializations and surpassing the traditional $r\ge n$ threshold. A corollary shows that in the exact-parameterization regime with favorable conditioning ($L/μ<3$), no spurious local minima arise for $rigstar\,igleq r$, highlighting a sharp dependence on conditioning. The authors develop a two-stage SDP bounding framework and a valid inequality relating invariants α,β to characterize counterexamples, providing rigorous insight into how modest overparameterization reshapes the optimization landscape and informs algorithmic design for large-scale SDP-like problems.

Abstract

We consider minimizing a twice-differentiable, $L$-smooth, and $μ$-strongly convex objective $φ$ over an $n\times n$ positive semidefinite matrix $M\succeq0$, under the assumption that the minimizer $M^{\star}$ has low rank $r^{\star}\ll n$. Following the Burer--Monteiro approach, we instead minimize the nonconvex objective $f(X)=φ(XX^{T})$ over a factor matrix $X$ of size $n\times r$. This substantially reduces the number of variables from $O(n^{2})$ to as few as $O(n)$ and also enforces positive semidefiniteness for free, but at the cost of giving up the convexity of the original problem. In this paper, we prove that if the search rank $r\ge r^{\star}$ is overparameterized by a \emph{constant factor} with respect to the true rank $r^{\star}$, namely as in $r>\frac{1}{4}(L/μ-1)^{2}r^{\star}$, then despite nonconvexity, local optimization is guaranteed to globally converge from any initial point to the global optimum. This significantly improves upon a previous rank overparameterization threshold of $r\ge n$, which we show is sharp in the absence of smoothness and strong convexity, but would increase the number of variables back up to $O(n^{2})$. Conversely, without rank overparameterization, we prove that such a global guarantee is possible if and only if $φ$ is almost perfectly conditioned, with a condition number of $L/μ<3$. Therefore, we conclude that a small amount of overparameterization can lead to large improvements in theoretical guarantees for the nonconvex Burer--Monteiro factorization.

Improved Global Guarantees for the Nonconvex Burer--Monteiro Factorization via Rank Overparameterization

TL;DR

This work analyzes the nonconvex Burer–Monteiro factorization for semidefinite-program-like objectives by studying

with

-smooth and

-strongly convex. It proves that a constant-factor overparameterization, specifically

, eliminates spurious local minima, enabling global convergence from arbitrary initializations and surpassing the traditional

threshold. A corollary shows that in the exact-parameterization regime with favorable conditioning (

), no spurious local minima arise for

, highlighting a sharp dependence on conditioning. The authors develop a two-stage SDP bounding framework and a valid inequality relating invariants α,β to characterize counterexamples, providing rigorous insight into how modest overparameterization reshapes the optimization landscape and informs algorithmic design for large-scale SDP-like problems.

Abstract

We consider minimizing a twice-differentiable,

-smooth, and

-strongly convex objective

over an

positive semidefinite matrix

, under the assumption that the minimizer

has low rank

. Following the Burer--Monteiro approach, we instead minimize the nonconvex objective

over a factor matrix

of size

. This substantially reduces the number of variables from

to as few as

and also enforces positive semidefiniteness for free, but at the cost of giving up the convexity of the original problem. In this paper, we prove that if the search rank

is overparameterized by a \emph{constant factor} with respect to the true rank

, namely as in

, then despite nonconvexity, local optimization is guaranteed to globally converge from any initial point to the global optimum. This significantly improves upon a previous rank overparameterization threshold of

, which we show is sharp in the absence of smoothness and strong convexity, but would increase the number of variables back up to

. Conversely, without rank overparameterization, we prove that such a global guarantee is possible if and only if

is almost perfectly conditioned, with a condition number of

. Therefore, we conclude that a small amount of overparameterization can lead to large improvements in theoretical guarantees for the nonconvex Burer--Monteiro factorization.

Paper Structure (11 sections, 15 theorems, 59 equations, 1 figure)

This paper contains 11 sections, 15 theorems, 59 equations, 1 figure.

Introduction
Main result
Algorithmic implications
Related work
Rank overparameterization for semidefinite programs
Matrix sensing
Low-rank matrix recovery
Notations
Proof of the main result
Proof of the valid inequality over $\alpha$ and $\beta$ (\ref{['lem:ab1']})
Proof of the closed-form lower-bound (\ref{['lem:abdef']})

Key Result

theorem 1

Let $\phi:\mathbb{S}^{n}\to\mathbb{R}$ be twice-differentiable, $L$-smooth and $\mu$-strongly convex, let the minimizer $M^{\star}=\arg\min_{M\succeq0}\phi(M)$ have true rank $r^{\star}=\mathrm{rank}(M^{\star})$.

Figures (1)

Figure 1: Overparameterization eliminates spurious local minima. Stochastic gradient descent (SGD) with Nesterov momentum sutskever2013importance applied to an $f(X)\overset{\mathrm{def}}{=}\phi(XX^{T})$ with a spurious second-order point $X_{\mathrm{spur}}$ for $r=3$: (Left) With search rank $r=3$, GD remains stuck at $X\approx X_{\mathrm{spur}}$, resulting in 55 failures out of 100 trials. (Right) Overparameterizing to $r=4$ eliminates $X_{\mathrm{spur}}$ as a spurious second-order point, and GD now succeeds in all 100 trials. (Set $\phi(M)=\sum_{i,j=1}^{n}\phi_{i,j}(M)$ where $\phi_{i,j}(M)=\frac{1}{2}|\left\langle A^{(i,j)},M-M^{\star}\right\rangle |^{2}$ as in \ref{['exa:overparam']} with $n=5$, $r=3,$ and $r^{\star}=2$, set $V=0$ and uniformly sample $X$ from $\|X-X_{\mathrm{spur}}\|_{F}\le0.1$, and then run $V_{\mathrm{new}}=\beta V-\alpha\nabla f_{i,j}(X)$ and $X_{\mathrm{new}}=X+\beta V_{\mathrm{new}}-\alpha\nabla f_{i,j}(X)$, with learning rate $\alpha=1\times10^{-1}$ and momentum $\beta=0.9$. Sample indices $i,j$ are randomly shuffled every 1 epoch = 25 iterations.)

Theorems & Definitions (28)

theorem 1: Overparameterization
corollary thmcountercorollary: Exact parameterization
proposition thmcounterproposition: Strict saddle property
lemma thmcounterlemma
proof
corollary thmcountercorollary: Restricted isometry property
proof
definition thmcounterdefinition
lemma thmcounterlemma: SDP formulation
proof
...and 18 more

Improved Global Guarantees for the Nonconvex Burer--Monteiro Factorization via Rank Overparameterization

TL;DR

Abstract

Improved Global Guarantees for the Nonconvex Burer--Monteiro Factorization via Rank Overparameterization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (28)