Table of Contents
Fetching ...

A Globally Convergent Third-Order Newton Method via Unified Semidefinite Programming Subproblems

Yubo Cai, Wenqi Zhu, Coralia Cartis, Gioele Zardini

Abstract

We propose the Adaptive Levenberg-Marquardt Third-Order Newton Method (ALMTON) for unconstrained nonconvex optimization, providing the first globally convergent realization of the unregularized third-order Newton method. Unlike the standard Adaptive Regularization framework with third-order models (AR3), which enforces global behavior through a quartic term, ALMTON employs an adaptive Levenberg-Marquardt (quadratic) regularization. This choice preserves a cubic model at every iteration, so that every subproblem is a tractable semidefinite program (SDP). In particular, the ALMTON-Simple variant requires exactly one SDP solve per iteration, making the per-iteration cost uniform and predictable. Algorithmically, ALMTON follows a mixed-mode strategy: it attempts an unregularized third-order step whenever the cubic Taylor model admits a strict local minimizer with adequate curvature, and activates (or increases) quadratic regularization only when needed to ensure that the model is well posed and the step is globally reliable. We establish global convergence and prove an $O(ε^{-2})$ worst-case evaluation complexity for computing an $ε$-approximate first-order stationary point. Numerical experiments show that ALMTON enlarges the basin of attraction relative to classical baselines (gradient descent and damped Newton), and can progress on landscapes where second-order methods typically stagnate. When compared with a state-of-the-art third-order implementation (AR3-interp), ALMTON converges more consistently and often in fewer iterations. We also characterize the practical scalability limits of the approach, highlighting the computational bottlenecks introduced by current SDP solvers as dimension grows.

A Globally Convergent Third-Order Newton Method via Unified Semidefinite Programming Subproblems

Abstract

We propose the Adaptive Levenberg-Marquardt Third-Order Newton Method (ALMTON) for unconstrained nonconvex optimization, providing the first globally convergent realization of the unregularized third-order Newton method. Unlike the standard Adaptive Regularization framework with third-order models (AR3), which enforces global behavior through a quartic term, ALMTON employs an adaptive Levenberg-Marquardt (quadratic) regularization. This choice preserves a cubic model at every iteration, so that every subproblem is a tractable semidefinite program (SDP). In particular, the ALMTON-Simple variant requires exactly one SDP solve per iteration, making the per-iteration cost uniform and predictable. Algorithmically, ALMTON follows a mixed-mode strategy: it attempts an unregularized third-order step whenever the cubic Taylor model admits a strict local minimizer with adequate curvature, and activates (or increases) quadratic regularization only when needed to ensure that the model is well posed and the step is globally reliable. We establish global convergence and prove an worst-case evaluation complexity for computing an -approximate first-order stationary point. Numerical experiments show that ALMTON enlarges the basin of attraction relative to classical baselines (gradient descent and damped Newton), and can progress on landscapes where second-order methods typically stagnate. When compared with a state-of-the-art third-order implementation (AR3-interp), ALMTON converges more consistently and often in fewer iterations. We also characterize the practical scalability limits of the approach, highlighting the computational bottlenecks introduced by current SDP solvers as dimension grows.
Paper Structure (26 sections, 15 theorems, 69 equations, 5 figures, 2 tables, 3 algorithms)

This paper contains 26 sections, 15 theorems, 69 equations, 5 figures, 2 tables, 3 algorithms.

Key Result

Lemma 2.3

Under assump:1, for all $x,x_k\in\mathbb{R}^n$,

Figures (5)

  • Figure 1: Flowchart of ALMTON. The model phase (Step 2) escalates $\sigma$ only as needed to enforce the invariants in Proposition \ref{['prop:model-phase']}. Step 3 applies the mixed ratio test \ref{['eq:rho']}. Step 4 resets $\sigma$ after success, or increases it after failure, before the next iteration.
  • Figure 1: Illustration of iteration batches in the optimization process.
  • Figure 1: Dolan-Moré performance profiles based on iteration counts comparing variants with baseline algorithms on a test set of $P=3600$ problem instances. The curves represent the fraction of problems $\rho_s(\tau)$ solved by each algorithm within a factor $\tau$ of the best performance.
  • Figure 2: Dolan-Moré performance profiles based on total wall-clock time comparing variants with baseline algorithms on a test set of $P=3600$ problem instances. The curves represent the fraction of problems $\rho_s(\tau)$ solved by each algorithm within a factor $\tau$ of the best performance.
  • Figure 3: Trajectory comparison on high-order geometric structures. Left: On the Slalom function, Newton method (orange) exhibits severe zig-zagging, while (red) follows the valley's geodesic with high path efficiency. Right: On the Hairpin Turn, successfully navigates the sharp bend defined by the barrier functions, whereas second-order methods (Newton, AR$2$) stagnate or oscillate.

Theorems & Definitions (35)

  • Remark 2.1: Unified subproblem solve
  • Remark 2.2
  • Lemma 2.3: Taylor model bounds
  • Proof 1
  • Corollary 2.4: Specialization to $p=3$
  • Remark 3.1
  • Proposition 3.2: Model-phase invariants
  • Proof 2
  • Lemma 3.3: Exact identity for the unregularized model decrease
  • Proof 3
  • ...and 25 more