The effect of smooth parametrizations on nonconvex optimization landscapes

Eitan Levin; Joe Kileel; Nicolas Boumal

The effect of smooth parametrizations on nonconvex optimization landscapes

Eitan Levin, Joe Kileel, Nicolas Boumal

TL;DR

The paper develops a general theory for comparing nonconvex optimization landscapes across two problems linked by a smooth lift $\varphi$, showing that the relation between desirable points often hinges on the lift geometry rather than the cost function. It introduces $\mathbf{L}_y$ and $\mathbf{Q}_y$ mappings to connect gradients and Hessians, and characterizes when $1$- and $2$-critical points map between problems, including openness, submersion properties, and tangent-cone conditions. The framework is applied to a broad spectrum of lifts—including Hadamard, Burer–Monteiro, low-rank factorizations, and tensor or neural-network parametrizations—yielding new guarantees and clarifying why some lifts induce benign nonconvexity while others can create spurious critical points. The results inform the design and analysis of parametrizations for low-rank optimization, semidefinite programming, and symmetry quotienting, with implications for reliably leveraging smooth lifts in practical algorithms.

Abstract

We develop new tools to study landscapes in nonconvex optimization. Given one optimization problem, we pair it with another by smoothly parametrizing the domain. This is either for practical purposes (e.g., to use smooth optimization algorithms with good guarantees) or for theoretical purposes (e.g., to reveal that the landscape satisfies a strict saddle property). In both cases, the central question is: how do the landscapes of the two problems relate? More precisely: how do desirable points such as local minima and critical points in one problem relate to those in the other problem? A key finding in this paper is that these relations are often determined by the parametrization itself, and are almost entirely independent of the cost function. Accordingly, we introduce a general framework to study parametrizations by their effect on landscapes. The framework enables us to obtain new guarantees for an array of problems, some of which were previously treated on a case-by-case basis in the literature. Applications include: optimizing low-rank matrices and tensors through factorizations; solving semidefinite programs via the Burer-Monteiro approach; training neural networks by optimizing their weights and biases; and quotienting out symmetries.

The effect of smooth parametrizations on nonconvex optimization landscapes

TL;DR

The paper develops a general theory for comparing nonconvex optimization landscapes across two problems linked by a smooth lift

, showing that the relation between desirable points often hinges on the lift geometry rather than the cost function. It introduces

and

mappings to connect gradients and Hessians, and characterizes when

- and

-critical points map between problems, including openness, submersion properties, and tangent-cone conditions. The framework is applied to a broad spectrum of lifts—including Hadamard, Burer–Monteiro, low-rank factorizations, and tensor or neural-network parametrizations—yielding new guarantees and clarifying why some lifts induce benign nonconvexity while others can create spurious critical points. The results inform the design and analysis of parametrizations for low-rank optimization, semidefinite programming, and symmetry quotienting, with implications for reliably leveraging smooth lifts in practical algorithms.

Abstract

Paper Structure (27 sections, 35 theorems, 87 equations, 1 figure, 1 table)

This paper contains 27 sections, 35 theorems, 87 equations, 1 figure, 1 table.

Introduction
Lifts and their properties
The sphere-to-simplex Hadamard lift
Smooth semidefinite programs via Burer--Monteiro
Low-rank matrices
Low-rank tensors
Neural networks
Submersions and higher order stationary points
Characterizations of lifts
Local minima
Stationary points
"1 $\Rightarrow\!$ 1": Lifts preserving 1-critical points
"2 $\Rightarrow\!$ 1": Lifts mapping 2-critical points to 1-critical points
Composition of lifts
Computing $\mathbf{L}_y$ and $\mathbf{Q}_y$
...and 12 more sections

Key Result

Theorem 2.3

The lift $\varphi \colon \mathcal{M} \to \mathcal{X}$ satisfies "local $\Rightarrow\!$ local" at $y\in\mathcal{M}$ if and only if it is open at $y$. If $\varphi$ does not satisfy "local $\Rightarrow\!$ local" at $y$, there is a smooth cost $f$ such that $y$ is a local minimum for eq:Q but $\varphi(y

Figures (1)

Figure 1: Nodal cubic in $\mathbb{R}^2$ as the shadow of its lift in $\mathbb{R}^3$, colored by the value of the function $f(x)=-x_1-x_2$. The highlighted points are $x=(0,0)$ (not stationary for \ref{['eq:P']}), and its two preimages on the lift, including $y=(0,0,1)$ (a spurious local minimum for \ref{['eq:Q']}).

Theorems & Definitions (87)

Definition 2.1
Definition 2.2: Desirable properties of lifts
Theorem 2.3
Theorem 2.4
Proposition 2.5
Proposition 2.6
Proposition 2.7
Proposition 2.8
Proposition 2.9
Proposition 2.10
...and 77 more

The effect of smooth parametrizations on nonconvex optimization landscapes

TL;DR

Abstract

The effect of smooth parametrizations on nonconvex optimization landscapes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (87)