Table of Contents
Fetching ...

Saddle Point Evasion via Curvature-Regularized Gradient Dynamics

Liraz Mudrik, Isaac Kaminer, Sean Kragelund, Abram H. Clark

Abstract

Nonconvex optimization underlies many modern machine learning and control tasks, where saddle points pose the dominant obstacle to reliable convergence in high-dimensional settings. Escaping these saddle points deterministically and at a controllable rate remains an open challenge: gradient descent is blind to curvature, stochastic perturbation methods lack deterministic guarantees, and Newton-type approaches suffer from Hessian singularity. We present Curvature-Regularized Gradient Dynamics (CRGD), which augments the objective with a smooth penalty on the most negative Hessian eigenvalue, yielding an augmented cost that serves as an optimization Lyapunov function with user-selectable convergence rates to second-order stationary points. Numerical experiments on a nonconvex matrix factorization example confirm that CRGD escapes saddle points across all tested configurations, with escape time that decreases with the eigenvalue gap, in contrast to gradient descent, whose escape time grows inversely with the gap.

Saddle Point Evasion via Curvature-Regularized Gradient Dynamics

Abstract

Nonconvex optimization underlies many modern machine learning and control tasks, where saddle points pose the dominant obstacle to reliable convergence in high-dimensional settings. Escaping these saddle points deterministically and at a controllable rate remains an open challenge: gradient descent is blind to curvature, stochastic perturbation methods lack deterministic guarantees, and Newton-type approaches suffer from Hessian singularity. We present Curvature-Regularized Gradient Dynamics (CRGD), which augments the objective with a smooth penalty on the most negative Hessian eigenvalue, yielding an augmented cost that serves as an optimization Lyapunov function with user-selectable convergence rates to second-order stationary points. Numerical experiments on a nonconvex matrix factorization example confirm that CRGD escapes saddle points across all tested configurations, with escape time that decreases with the eigenvalue gap, in contrast to gradient descent, whose escape time grows inversely with the gap.
Paper Structure (15 sections, 5 theorems, 19 equations, 3 figures, 2 tables)

This paper contains 15 sections, 5 theorems, 19 equations, 3 figures, 2 tables.

Key Result

Proposition III.1

Under Assumption ass:smooth, $\Phi \in C^{1,1}(\mathbb{R}^n)$: the gradient $\nabla \Phi$ exists everywhere and is locally Lipschitz. In particular, $\nabla \Phi$ is well-defined even at eigenvalue crossings.

Figures (3)

  • Figure 1: Three-fold potential: GD converges to the saddle (x marker), while CRGD escapes to the outer local minimum.
  • Figure 2: Augmented cost $\Phi(\mathbf{x}(t))$ on the three-fold potential. GD (gray) stalls near the saddle; the four CRGD curves each track the theoretical solution (dashed) until $\Phi_{\mathrm{eq}} \approx 0.019$. The prescribed-time law converges at exactly $t = T = 0.1$ s.
  • Figure 3: Eigenvalue gap sweep ($n = 50$, adversarial IC). GD convergence time scales as $O(1/\delta)$; CRGD scales as $O(\delta)$. Gray reference lines: $\delta^{-1}\ln(1/\varepsilon)$ and $\delta/5$.

Theorems & Definitions (13)

  • Definition II.1: Strict Saddle Point
  • Proposition III.1: Regularity of $\Phi$
  • proof
  • Remark III.3: Role of $\beta$ and Computational Cost
  • Proposition III.4: Saddle Points are Non-Critical for $\Phi$
  • proof
  • Proposition III.5: Spurious Points are Saddles of $\Phi$
  • proof
  • Remark III.6
  • Theorem III.7: Convergence to SOSP with Selectable Rate
  • ...and 3 more