Gradient Descent Converges to Minimizers
Jason D. Lee, Max Simchowitz, Michael I. Jordan, Benjamin Recht
TL;DR
The paper addresses the challenge of saddle points in nonconvex optimization by developing a dynamical-systems framework for gradient descent with random initialization and a small constant step size. It proves that, for functions with the strict saddle property, gradient descent almost surely avoids strict saddles and converges to a local minimizer (or diverges) by leveraging the stable-manifold theorem and showing the gradient map is a diffeomorphism when the step size is below 1/L. It further derives consequences under countable or isolated saddles and under coercivity with Łojasiewicz gradient inequality, including explicit convergence rates. Overall, the results provide noise-free, global-avoidance guarantees for saddle points with broad implications for nonconvex optimization and related algorithms.
Abstract
We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.
