The effect of smooth parametrizations on nonconvex optimization landscapes
Eitan Levin, Joe Kileel, Nicolas Boumal
TL;DR
The paper develops a general theory for comparing nonconvex optimization landscapes across two problems linked by a smooth lift $\varphi$, showing that the relation between desirable points often hinges on the lift geometry rather than the cost function. It introduces $\mathbf{L}_y$ and $\mathbf{Q}_y$ mappings to connect gradients and Hessians, and characterizes when $1$- and $2$-critical points map between problems, including openness, submersion properties, and tangent-cone conditions. The framework is applied to a broad spectrum of lifts—including Hadamard, Burer–Monteiro, low-rank factorizations, and tensor or neural-network parametrizations—yielding new guarantees and clarifying why some lifts induce benign nonconvexity while others can create spurious critical points. The results inform the design and analysis of parametrizations for low-rank optimization, semidefinite programming, and symmetry quotienting, with implications for reliably leveraging smooth lifts in practical algorithms.
Abstract
We develop new tools to study landscapes in nonconvex optimization. Given one optimization problem, we pair it with another by smoothly parametrizing the domain. This is either for practical purposes (e.g., to use smooth optimization algorithms with good guarantees) or for theoretical purposes (e.g., to reveal that the landscape satisfies a strict saddle property). In both cases, the central question is: how do the landscapes of the two problems relate? More precisely: how do desirable points such as local minima and critical points in one problem relate to those in the other problem? A key finding in this paper is that these relations are often determined by the parametrization itself, and are almost entirely independent of the cost function. Accordingly, we introduce a general framework to study parametrizations by their effect on landscapes. The framework enables us to obtain new guarantees for an array of problems, some of which were previously treated on a case-by-case basis in the literature. Applications include: optimizing low-rank matrices and tensors through factorizations; solving semidefinite programs via the Burer-Monteiro approach; training neural networks by optimizing their weights and biases; and quotienting out symmetries.
