A Globally Convergent Third-Order Newton Method via Unified Semidefinite Programming Subproblems

Yubo Cai; Wenqi Zhu; Coralia Cartis; Gioele Zardini

A Globally Convergent Third-Order Newton Method via Unified Semidefinite Programming Subproblems

Yubo Cai, Wenqi Zhu, Coralia Cartis, Gioele Zardini

Abstract

We propose the Adaptive Levenberg-Marquardt Third-Order Newton Method (ALMTON) for unconstrained nonconvex optimization, providing the first globally convergent realization of the unregularized third-order Newton method. Unlike the standard Adaptive Regularization framework with third-order models (AR3), which enforces global behavior through a quartic term, ALMTON employs an adaptive Levenberg-Marquardt (quadratic) regularization. This choice preserves a cubic model at every iteration, so that every subproblem is a tractable semidefinite program (SDP). In particular, the ALMTON-Simple variant requires exactly one SDP solve per iteration, making the per-iteration cost uniform and predictable. Algorithmically, ALMTON follows a mixed-mode strategy: it attempts an unregularized third-order step whenever the cubic Taylor model admits a strict local minimizer with adequate curvature, and activates (or increases) quadratic regularization only when needed to ensure that the model is well posed and the step is globally reliable. We establish global convergence and prove an $O(ε^{-2})$ worst-case evaluation complexity for computing an $ε$-approximate first-order stationary point. Numerical experiments show that ALMTON enlarges the basin of attraction relative to classical baselines (gradient descent and damped Newton), and can progress on landscapes where second-order methods typically stagnate. When compared with a state-of-the-art third-order implementation (AR3-interp), ALMTON converges more consistently and often in fewer iterations. We also characterize the practical scalability limits of the approach, highlighting the computational bottlenecks introduced by current SDP solvers as dimension grows.

A Globally Convergent Third-Order Newton Method via Unified Semidefinite Programming Subproblems

Abstract

worst-case evaluation complexity for computing an

-approximate first-order stationary point. Numerical experiments show that ALMTON enlarges the basin of attraction relative to classical baselines (gradient descent and damped Newton), and can progress on landscapes where second-order methods typically stagnate. When compared with a state-of-the-art third-order implementation (AR3-interp), ALMTON converges more consistently and often in fewer iterations. We also characterize the practical scalability limits of the approach, highlighting the computational bottlenecks introduced by current SDP solvers as dimension grows.

Paper Structure (26 sections, 15 theorems, 69 equations, 5 figures, 2 tables, 3 algorithms)

This paper contains 26 sections, 15 theorems, 69 equations, 5 figures, 2 tables, 3 algorithms.

Introduction
Contributions and Organization
Preliminaries and Notation
Norms and tensor notation
Taylor models
Cubic polynomials and subproblem formulation
Higher-order Optimization Algorithms
Unregularized third-order Newton and LM regularization
ARp models
Trade-offs motivating our approach
Standing assumptions and basic bounds
Strategy 1: The Simple Variant
Strategy 2: The Heuristic Variant
Phase notation
First- and second-order conditions at the trial point
...and 11 more sections

Key Result

Lemma 2.3

Under assump:1, for all $x,x_k\in\mathbb{R}^n$,

Figures (5)

Figure 1: Flowchart of ALMTON. The model phase (Step 2) escalates $\sigma$ only as needed to enforce the invariants in Proposition \ref{['prop:model-phase']}. Step 3 applies the mixed ratio test \ref{['eq:rho']}. Step 4 resets $\sigma$ after success, or increases it after failure, before the next iteration.
Figure 1: Illustration of iteration batches in the optimization process.
Figure 1: Dolan-Moré performance profiles based on iteration counts comparing variants with baseline algorithms on a test set of $P=3600$ problem instances. The curves represent the fraction of problems $\rho_s(\tau)$ solved by each algorithm within a factor $\tau$ of the best performance.
Figure 2: Dolan-Moré performance profiles based on total wall-clock time comparing variants with baseline algorithms on a test set of $P=3600$ problem instances. The curves represent the fraction of problems $\rho_s(\tau)$ solved by each algorithm within a factor $\tau$ of the best performance.
Figure 3: Trajectory comparison on high-order geometric structures. Left: On the Slalom function, Newton method (orange) exhibits severe zig-zagging, while (red) follows the valley's geodesic with high path efficiency. Right: On the Hairpin Turn, successfully navigates the sharp bend defined by the barrier functions, whereas second-order methods (Newton, AR$2$) stagnate or oscillate.

Theorems & Definitions (35)

Remark 2.1: Unified subproblem solve
Remark 2.2
Lemma 2.3: Taylor model bounds
Proof 1
Corollary 2.4: Specialization to $p=3$
Remark 3.1
Proposition 3.2: Model-phase invariants
Proof 2
Lemma 3.3: Exact identity for the unregularized model decrease
Proof 3
...and 25 more

A Globally Convergent Third-Order Newton Method via Unified Semidefinite Programming Subproblems

Abstract

A Globally Convergent Third-Order Newton Method via Unified Semidefinite Programming Subproblems

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (35)