Higher-Order Newton Methods with Polynomial Work per Iteration

Amir Ali Ahmadi; Abraar Chaudhry; Jeffrey Zhang

Higher-Order Newton Methods with Polynomial Work per Iteration

Amir Ali Ahmadi, Abraar Chaudhry, Jeffrey Zhang

TL;DR

This work introduces higher-order Newton methods of order $d$ that replace the $d$-th order Taylor expansion with an sos-convex polynomial and minimize it via semidefinite programs, achieving polynomial-in-$n$ per-iteration cost for fixed $d$ and local convergence of order $d$ without requiring global convexity. A globally convergent variant is provided for odd $d$ under additional smoothness and convexity-like assumptions, with a proven $O(k^{-d})$ decrease in objective value. Numerical experiments in one and two dimensions illustrate larger basins of attraction and faster convergence relative to classical Newton, including a Beale function example. The framework leverages SOS techniques and the first level of the Lasserre hierarchy to obtain tractable SDP formulations, and opens avenues for scalable relaxations and higher-order quasi-Newton extensions in nonconvex optimization. Overall, the paper offers a principled, polynomial-cost pathway to higher-order Newton methods with provable convergence guarantees and practical demonstrations.

Abstract

We present generalizations of Newton's method that incorporate derivatives of an arbitrary order $d$ but maintain a polynomial dependence on dimension in their cost per iteration. At each step, our $d^{\text{th}}$-order method uses semidefinite programming to construct and minimize a sum of squares-convex approximation to the $d^{\text{th}}$-order Taylor expansion of the function we wish to minimize. We prove that our $d^{\text{th}}$-order method has local convergence of order $d$. This results in lower oracle complexity compared to the classical Newton method. We show on numerical examples that basins of attraction around local minima can get larger as $d$ increases. Under additional assumptions, we present a modified algorithm, again with polynomial cost per iteration, which is globally convergent and has local convergence of order $d$.

Higher-Order Newton Methods with Polynomial Work per Iteration

TL;DR

This work introduces higher-order Newton methods of order

that replace the

-th order Taylor expansion with an sos-convex polynomial and minimize it via semidefinite programs, achieving polynomial-in-

per-iteration cost for fixed

and local convergence of order

without requiring global convexity. A globally convergent variant is provided for odd

under additional smoothness and convexity-like assumptions, with a proven

decrease in objective value. Numerical experiments in one and two dimensions illustrate larger basins of attraction and faster convergence relative to classical Newton, including a Beale function example. The framework leverages SOS techniques and the first level of the Lasserre hierarchy to obtain tractable SDP formulations, and opens avenues for scalable relaxations and higher-order quasi-Newton extensions in nonconvex optimization. Overall, the paper offers a principled, polynomial-cost pathway to higher-order Newton methods with provable convergence guarantees and practical demonstrations.

Abstract

We present generalizations of Newton's method that incorporate derivatives of an arbitrary order

but maintain a polynomial dependence on dimension in their cost per iteration. At each step, our

-order method uses semidefinite programming to construct and minimize a sum of squares-convex approximation to the

-order Taylor expansion of the function we wish to minimize. We prove that our

-order method has local convergence of order

. This results in lower oracle complexity compared to the classical Newton method. We show on numerical examples that basins of attraction around local minima can get larger as

increases. Under additional assumptions, we present a modified algorithm, again with polynomial cost per iteration, which is globally convergent and has local convergence of order

Paper Structure (15 sections, 12 theorems, 61 equations, 5 figures)

This paper contains 15 sections, 12 theorems, 61 equations, 5 figures.

Introduction
Related Work
Organization and Contributions
Preliminaries
SOS-Convex Polynomial Optimization
Error rates of Taylor remainders
Algorithm Definition
Algorithm Analysis and Convergence
Numerical Examples
The Univariate Case
Example 1
Example 2
A Multivariate Example
Global convergence
Future directions

Key Result

Theorem 1

For a variable $x\in \mathbb{R}^n$ and an even integer $d$, let $\phi_{\frac{d}{2}}(x)$ denote the vector of all monomials of degree at most $\frac{d}{2}$ in $x$. A polynomial $p : \mathbb{R}^n \mapsto \mathbb{R}$ of degree $d$ is sos if and only if there exists a symmetric matrix $Q$ such that (i)

Figures (5)

Figure 1: A comparison of one iteration of the classical Newton method and our third-order Newton method applied to the function in \ref{['Eq; square root function']} starting at $x_0= 1.5$.
Figure 2: 5th-order Newton iterates applied to the function in \ref{['Eq; square root function']}.
Figure 3: Comparison of the classical Newton map $N_2$ and our third-order Newton map $N_3$ applied to the function in \ref{['Eq: arctan function']}. Subfigure (a) implies that the third-order method is globally convergent, while the classical method is not. Subfigure (b) zooms in on the behavior of these maps near the origin to show that the basin of attraction for the classical method is approximately $( -1.712, 1.712)$.
Figure 4: Iterates of our third-order and the classical Newton method applied to the function in \ref{['Eq: arctan function']} starting from a point in the basin of attraction of both methods.
Figure 5: The basins of attraction for the classical and the third-order Newton methods for the minimizer of the Beale function. The basin for the classical method has fractal structure, demonstrating more sensitivity to initialization.

Theorems & Definitions (24)

Definition 1
Theorem 1: see, e.g., pablothesis
Definition 2: SOS-Convex
Theorem 2: See Corollary 2.5 from Lasserre2008, and Theorem 3.3 from lasserre2009
proof
Lemma 1: see, e.g., inequality (11) in baes2009estimate
Theorem 3
Theorem 4
Lemma 2
proof
...and 14 more

Higher-Order Newton Methods with Polynomial Work per Iteration

TL;DR

Abstract

Higher-Order Newton Methods with Polynomial Work per Iteration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (24)