Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing
Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright
TL;DR
This work addresses robust optimization in nonconvex stochastic settings with adversarial outliers, focusing on finding approximate second-order stationary points (SOSPs) under strong contamination. It introduces a general framework that leverages dimension-independent robust estimates of gradients and Hessians to guide nonconvex optimization, achieving SOSP guarantees with $n = \widetilde{Ω}(D^2/ε)$ samples. The framework is then specialized to outlier-robust low-rank matrix sensing with Gaussian design, delivering exact recovery in the noiseless case and provable error bounds in the noisy case, with sample complexity $n = \widetilde{O}((d^2 r^2 + d r \log(Γ/ξ))/ε)$. A Statistical Query lower bound is provided to argue that the quadratic dimension dependence in the sample complexity is necessary for efficient SQ algorithms, underscoring a fundamental information–computation tradeoff. Overall, the paper advances robust nonconvex optimization by delivering dimension-independent SOSP guarantees, principled tensor-Hessian robustness, and tight lower bounds, with concrete implications for robust matrix sensing and related nonconvex problems.
Abstract
Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings. In this paper, we study the problem of finding SOSPs in the strong contamination model, where a constant fraction of datapoints are arbitrarily corrupted. We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/ε)$ samples where $D$ is the ambient dimension and $ε$ is the fraction of corrupted datapoints. As a concrete application of our framework, we apply it to the problem of low rank matrix sensing, developing efficient and provably robust algorithms that can tolerate corruptions in both the sensing matrices and the measurements. In addition, we establish a Statistical Query lower bound providing evidence that the quadratic dependence on $D$ in the sample complexity is necessary for computationally efficient algorithms.
