Table of Contents
Fetching ...

A non-autonomous center-stable set theorem for saddle avoidance in optimization

Andreea-Alexandra Muşat, Nicolas Boumal

TL;DR

A new Center-Stable Set Theorem (CSST) is established for non-autonomous systems, used to prove saddle avoidance for gradient descent and for the proximal point method, without assuming Lipschitz gradients or isolated saddles, and allowing vanishing step sizes.

Abstract

Optimization algorithms are unlikely to converge to strict saddle points. Proofs to that effect rely on the Center-Stable Manifold Theorem (CSMT), casting algorithms as dynamical systems: $x_{k+1} = g_k(x_k)$. In its standard form, the CSMT is limited to autonomous systems (the maps $g_k$ are all the same). To study algorithms such as gradient descent with non-constant step-size schedules, we need a non-autonomous CSMT. There are a few, but they are unable to handle, for example, vanishing step sizes. To cover such scenarios, we establish a new Center-Stable Set Theorem (CSST) for non-autonomous systems. We use it to prove saddle avoidance for gradient descent (Euclidean and Riemannian) and for the proximal point method, without assuming Lipschitz gradients or isolated saddles, and allowing vanishing step sizes.

A non-autonomous center-stable set theorem for saddle avoidance in optimization

TL;DR

A new Center-Stable Set Theorem (CSST) is established for non-autonomous systems, used to prove saddle avoidance for gradient descent and for the proximal point method, without assuming Lipschitz gradients or isolated saddles, and allowing vanishing step sizes.

Abstract

Optimization algorithms are unlikely to converge to strict saddle points. Proofs to that effect rely on the Center-Stable Manifold Theorem (CSMT), casting algorithms as dynamical systems: . In its standard form, the CSMT is limited to autonomous systems (the maps are all the same). To study algorithms such as gradient descent with non-constant step-size schedules, we need a non-autonomous CSMT. There are a few, but they are unable to handle, for example, vanishing step sizes. To cover such scenarios, we establish a new Center-Stable Set Theorem (CSST) for non-autonomous systems. We use it to prove saddle avoidance for gradient descent (Euclidean and Riemannian) and for the proximal point method, without assuming Lipschitz gradients or isolated saddles, and allowing vanishing step sizes.
Paper Structure (34 sections, 24 theorems, 104 equations)

This paper contains 34 sections, 24 theorems, 104 equations.

Key Result

Theorem 2.6

Let $( (g_k, T_k \colon {\mathbb{R}}^d \to {\mathbb{R}}^d) )_{k \geq 0}$ be a jointly globally pseudo-hyperbolic sequence of pairs $(g_k, T_k) \in \mathrm{PH}(\mu_k, \lambda_k, \varepsilon_k; {\mathbb{R}}^d, E_{\mathrm{cs}} \oplus E_{\mathrm{u}})$ with constants satisfying $\sum_{k=0}^\infty \frac{\

Theorems & Definitions (57)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Theorem 2.6
  • Definition 2.7
  • Remark 2.8
  • Theorem 2.9: non-autonomous center-stable set theorem
  • proof
  • ...and 47 more