Table of Contents
Fetching ...

Avoidance of non-strict saddle points by blow-up

El Mehdi Achour, Umberto L. Hryniewicz, Michael Westdickenberg

TL;DR

The paper addresses the problem of gradient-flow trajectories avoiding nonstrict saddle points in nonconvex optimization. It introduces a blow-up technique that nonlinearly rescales the gradient near a saddle, lifting the local geometry to a blown-up sphere and enabling a center-stable manifold analysis. By analyzing the blown-up vector field and its spectrum, it derives conditions under which almost all trajectories avoid convergence to the saddle, with a measure-zero stable set. An explicit example illustrates the method, and the framework can be iterated for finer tests when initial tests are inconclusive. Overall, the approach provides concrete criteria and a structural, geometric understanding of gradient-flow avoidance in degenerate settings, with potential implications for optimization dynamics in high dimensions.

Abstract

It is an old idea to use gradient flows or time-discretized variants thereof as methods for solving minimization problems. In some applications, for example in machine learning contexts, it is important to know that for generic initial data, gradient flow trajectories do not get stuck at saddle points. There are classical results concerned with the nondegenerate situation. But if the Hessian of the objective function has a nontrivial kernel at the critical point, then these results are inconclusive. In this paper, we show how relevant information can be extracted by ``blowing up'' the objective function around the non-strict saddle point, i.e., by a suitable nonlinear rescaling that makes the higher order geometry visible. Then the center-stable manifold theorem of dynamical system theory can be applied.

Avoidance of non-strict saddle points by blow-up

TL;DR

The paper addresses the problem of gradient-flow trajectories avoiding nonstrict saddle points in nonconvex optimization. It introduces a blow-up technique that nonlinearly rescales the gradient near a saddle, lifting the local geometry to a blown-up sphere and enabling a center-stable manifold analysis. By analyzing the blown-up vector field and its spectrum, it derives conditions under which almost all trajectories avoid convergence to the saddle, with a measure-zero stable set. An explicit example illustrates the method, and the framework can be iterated for finer tests when initial tests are inconclusive. Overall, the approach provides concrete criteria and a structural, geometric understanding of gradient-flow avoidance in degenerate settings, with potential implications for optimization dynamics in high dimensions.

Abstract

It is an old idea to use gradient flows or time-discretized variants thereof as methods for solving minimization problems. In some applications, for example in machine learning contexts, it is important to know that for generic initial data, gradient flow trajectories do not get stuck at saddle points. There are classical results concerned with the nondegenerate situation. But if the Hessian of the objective function has a nontrivial kernel at the critical point, then these results are inconclusive. In this paper, we show how relevant information can be extracted by ``blowing up'' the objective function around the non-strict saddle point, i.e., by a suitable nonlinear rescaling that makes the higher order geometry visible. Then the center-stable manifold theorem of dynamical system theory can be applied.

Paper Structure

This paper contains 8 sections, 10 theorems, 146 equations, 1 figure.

Key Result

Theorem 1.2

Let $f: \mathbb{R}^d \longrightarrow \mathbb{R}$ be a smooth function and denote by $\varphi^t$ the flow of $-\nabla f$ (formed by gradient flow trajectories). Let $C \subset \mathrm{Crit}(f)$ be such that every $w_* \in C$ is a tamed weakly strict saddle point. Consider the set $E$ of points $w_0\i

Figures (1)

  • Figure 1: Left: Example of a strict saddle point at (0,0). Right: Example of nonstrict saddle point.

Theorems & Definitions (23)

  • Definition 1.1
  • Theorem 1.2
  • Theorem 1.3
  • proof : Proof of Theorem \ref{['main_thm_larger_sets']}
  • Theorem 1.4
  • proof
  • Lemma 2.1
  • proof
  • Definition 2.2
  • Remark 2.3
  • ...and 13 more