Dropping Convexity for Faster Semi-definite Optimization
Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi
TL;DR
This work studies minimizing a convex function over the PSD cone using a non-convex factorization X=UU^T, introducing Factored Gradient Descent (FGD) with a specially designed step size. The authors prove that FGD achieves an O(1/k) convergence rate for smooth convex objectives and linear convergence under (m,r)-restricted strong convexity, with performance dependent on spectral properties of the optimum. They propose initialization schemes based on first-order information to guarantee a good starting point and demonstrate computational advantages over traditional SDP methods. Overall, the paper provides precise convergence guarantees for general convex objectives in the PSD setting and explains practical performance observed in matrix sensing and related tasks.
Abstract
We study the minimization of a convex function $f(X)$ over the set of $n\times n$ positive semi-definite matrices, but when the problem is recast as $\min_U g(U) := f(UU^\top)$, with $U \in \mathbb{R}^{n \times r}$ and $r \leq n$. We study the performance of gradient descent on $g$---which we refer to as Factored Gradient Descent (FGD)---under standard assumptions on the original function $f$. We provide a rule for selecting the step size and, with this choice, show that the local convergence rate of FGD mirrors that of standard gradient descent on the original $f$: i.e., after $k$ steps, the error is $O(1/k)$ for smooth $f$, and exponentially small in $k$ when $f$ is (restricted) strongly convex. In addition, we provide a procedure to initialize FGD for (restricted) strongly convex objectives and when one only has access to $f$ via a first-order oracle; for several problem instances, such proper initialization leads to global convergence guarantees. FGD and similar procedures are widely used in practice for problems that can be posed as matrix factorization. To the best of our knowledge, this is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.
