A non-autonomous center-stable set theorem for saddle avoidance in optimization
Andreea-Alexandra Muşat, Nicolas Boumal
TL;DR
A new Center-Stable Set Theorem (CSST) is established for non-autonomous systems, used to prove saddle avoidance for gradient descent and for the proximal point method, without assuming Lipschitz gradients or isolated saddles, and allowing vanishing step sizes.
Abstract
Optimization algorithms are unlikely to converge to strict saddle points. Proofs to that effect rely on the Center-Stable Manifold Theorem (CSMT), casting algorithms as dynamical systems: $x_{k+1} = g_k(x_k)$. In its standard form, the CSMT is limited to autonomous systems (the maps $g_k$ are all the same). To study algorithms such as gradient descent with non-constant step-size schedules, we need a non-autonomous CSMT. There are a few, but they are unable to handle, for example, vanishing step sizes. To cover such scenarios, we establish a new Center-Stable Set Theorem (CSST) for non-autonomous systems. We use it to prove saddle avoidance for gradient descent (Euclidean and Riemannian) and for the proximal point method, without assuming Lipschitz gradients or isolated saddles, and allowing vanishing step sizes.
