A non-autonomous center-stable set theorem for saddle avoidance in optimization

Andreea-Alexandra Muşat; Nicolas Boumal

A non-autonomous center-stable set theorem for saddle avoidance in optimization

Andreea-Alexandra Muşat, Nicolas Boumal

TL;DR

A new Center-Stable Set Theorem (CSST) is established for non-autonomous systems, used to prove saddle avoidance for gradient descent and for the proximal point method, without assuming Lipschitz gradients or isolated saddles, and allowing vanishing step sizes.

Abstract

Optimization algorithms are unlikely to converge to strict saddle points. Proofs to that effect rely on the Center-Stable Manifold Theorem (CSMT), casting algorithms as dynamical systems: $x_{k+1} = g_k(x_k)$. In its standard form, the CSMT is limited to autonomous systems (the maps $g_k$ are all the same). To study algorithms such as gradient descent with non-constant step-size schedules, we need a non-autonomous CSMT. There are a few, but they are unable to handle, for example, vanishing step sizes. To cover such scenarios, we establish a new Center-Stable Set Theorem (CSST) for non-autonomous systems. We use it to prove saddle avoidance for gradient descent (Euclidean and Riemannian) and for the proximal point method, without assuming Lipschitz gradients or isolated saddles, and allowing vanishing step sizes.

A non-autonomous center-stable set theorem for saddle avoidance in optimization

TL;DR

Abstract

Optimization algorithms are unlikely to converge to strict saddle points. Proofs to that effect rely on the Center-Stable Manifold Theorem (CSMT), casting algorithms as dynamical systems:

. In its standard form, the CSMT is limited to autonomous systems (the maps

are all the same). To study algorithms such as gradient descent with non-constant step-size schedules, we need a non-autonomous CSMT. There are a few, but they are unable to handle, for example, vanishing step sizes. To cover such scenarios, we establish a new Center-Stable Set Theorem (CSST) for non-autonomous systems. We use it to prove saddle avoidance for gradient descent (Euclidean and Riemannian) and for the proximal point method, without assuming Lipschitz gradients or isolated saddles, and allowing vanishing step sizes.

Paper Structure (34 sections, 24 theorems, 104 equations)

This paper contains 34 sections, 24 theorems, 104 equations.

Introduction
The standard CSMT and some variants.
Applications in optimization.
A non-autonomous avoidance theorem for unstable fixed point
A local result
A global result
The graph transform method for autonomous systems
A graph invariance property
Function space setup for the graph transform
The potential
Exploiting the graph transform with non-autonomous systems
General potential growth inequality.
First use of the non-summability assumption.
Implications for bounded trajectories.
Building the functions $\varphi_0, \varphi_1, \ldots$
...and 19 more sections

Key Result

Theorem 2.6

Let $( (g_k, T_k \colon {\mathbb{R}}^d \to {\mathbb{R}}^d) )_{k \geq 0}$ be a jointly globally pseudo-hyperbolic sequence of pairs $(g_k, T_k) \in \mathrm{PH}(\mu_k, \lambda_k, \varepsilon_k; {\mathbb{R}}^d, E_{\mathrm{cs}} \oplus E_{\mathrm{u}})$ with constants satisfying $\sum_{k=0}^\infty \frac{\

Theorems & Definitions (57)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Definition 2.5
Theorem 2.6
Definition 2.7
Remark 2.8
Theorem 2.9: non-autonomous center-stable set theorem
proof
...and 47 more

A non-autonomous center-stable set theorem for saddle avoidance in optimization

TL;DR

Abstract

A non-autonomous center-stable set theorem for saddle avoidance in optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (57)