Saddle Point Evasion via Curvature-Regularized Gradient Dynamics

Liraz Mudrik; Isaac Kaminer; Sean Kragelund; Abram H. Clark

Saddle Point Evasion via Curvature-Regularized Gradient Dynamics

Liraz Mudrik, Isaac Kaminer, Sean Kragelund, Abram H. Clark

Abstract

Nonconvex optimization underlies many modern machine learning and control tasks, where saddle points pose the dominant obstacle to reliable convergence in high-dimensional settings. Escaping these saddle points deterministically and at a controllable rate remains an open challenge: gradient descent is blind to curvature, stochastic perturbation methods lack deterministic guarantees, and Newton-type approaches suffer from Hessian singularity. We present Curvature-Regularized Gradient Dynamics (CRGD), which augments the objective with a smooth penalty on the most negative Hessian eigenvalue, yielding an augmented cost that serves as an optimization Lyapunov function with user-selectable convergence rates to second-order stationary points. Numerical experiments on a nonconvex matrix factorization example confirm that CRGD escapes saddle points across all tested configurations, with escape time that decreases with the eigenvalue gap, in contrast to gradient descent, whose escape time grows inversely with the gap.

Saddle Point Evasion via Curvature-Regularized Gradient Dynamics

Abstract

Paper Structure (15 sections, 5 theorems, 19 equations, 3 figures, 2 tables)

This paper contains 15 sections, 5 theorems, 19 equations, 3 figures, 2 tables.

Introduction
Problem Statement and Preliminaries
Optimization Lyapunov Functions
Standing Assumptions
Curvature-Regularized Gradient Dynamics
The Augmented Cost
Regularity and Gradient
Dynamics
Convergence Analysis
Numerical Validation
2D Example: Three-Fold Potential
High-Dimensional Example: Matrix Factorization
Eigenvalue Gap Sweep (Adversarial Initialization)
Monte Carlo Study
Conclusion

Key Result

Proposition III.1

Under Assumption ass:smooth, $\Phi \in C^{1,1}(\mathbb{R}^n)$: the gradient $\nabla \Phi$ exists everywhere and is locally Lipschitz. In particular, $\nabla \Phi$ is well-defined even at eigenvalue crossings.

Figures (3)

Figure 1: Three-fold potential: GD converges to the saddle (x marker), while CRGD escapes to the outer local minimum.
Figure 2: Augmented cost $\Phi(\mathbf{x}(t))$ on the three-fold potential. GD (gray) stalls near the saddle; the four CRGD curves each track the theoretical solution (dashed) until $\Phi_{\mathrm{eq}} \approx 0.019$. The prescribed-time law converges at exactly $t = T = 0.1$ s.
Figure 3: Eigenvalue gap sweep ($n = 50$, adversarial IC). GD convergence time scales as $O(1/\delta)$; CRGD scales as $O(\delta)$. Gray reference lines: $\delta^{-1}\ln(1/\varepsilon)$ and $\delta/5$.

Theorems & Definitions (13)

Definition II.1: Strict Saddle Point
Proposition III.1: Regularity of $\Phi$
proof
Remark III.3: Role of $\beta$ and Computational Cost
Proposition III.4: Saddle Points are Non-Critical for $\Phi$
proof
Proposition III.5: Spurious Points are Saddles of $\Phi$
proof
Remark III.6
Theorem III.7: Convergence to SOSP with Selectable Rate
...and 3 more

Saddle Point Evasion via Curvature-Regularized Gradient Dynamics

Abstract

Saddle Point Evasion via Curvature-Regularized Gradient Dynamics

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (13)