Global Convergence of High-Order Regularization Methods with Sums-of-Squares Taylor Models

Wenqi Zhu; Coralia Cartis

Global Convergence of High-Order Regularization Methods with Sums-of-Squares Taylor Models

Wenqi Zhu, Coralia Cartis

TL;DR

This work develops an adaptive-regularization framework that embeds a Sum-of-Squares (SoS) Taylor model into high-order ($p\ge3$) optimization for nonconvex smooth functions. Each iteration minimizes a tractable SoS-based polynomial model via semidefinite programming, with a data-dependent perturbation $\delta=\epsilon^{a}$ ensuring well-conditioned subproblems. The authors prove global convergence and establish worst-case evaluation complexities: $\mathcal{O}(\epsilon^{-2})$ in general nonconvex settings, and improved rates $\mathcal{O}(\epsilon^{-1/p})$ for odd $p$ or $\mathcal{O}(\epsilon^{-(p+1)/p})$ for even $p$ when $f$ is strongly convex. The results rely on uniform upper bounds for regularization parameters and detailed analysis of function-value decrease in successful iterations. This work provides the first global-rate analysis for a tractable high-order subproblem in nonconvex optimization and sets the stage for further refinements and practical SDP-based implementations.

Abstract

High-order tensor methods that employ Taylor-based local models (of degree $p\ge 3$) within adaptive regularization frameworks have been recently proposed for both convex and nonconvex optimization problems. They have been shown to have superior, and even optimal, worst-case global convergence rates and local rates compared to Newton's method. Finding rigorous and efficient techniques for minimizing the Taylor polynomial sub-problems remains a challenging aspect for these algorithms. Ahmadi et al. recently introduced a tensor method based on sum-of-squares (SoS) reformulations, so that each Taylor polynomial sub-problem in their approach can be tractably minimized using semidefinite programming (SDP); however, the global convergence and complexity of their method have not been addressed for general nonconvex problems. This paper introduces an algorithmic framework that combines the Sum of Squares (SoS) Taylor model with adaptive regularization techniques for nonconvex smooth optimization problems. Each iteration minimizes an SoS Taylor model, offering a polynomial cost per iteration. For general nonconvex functions, the worst-case evaluation complexity bound is $\mathcal{O}(ε^{-2})$, while for strongly convex functions, an improved evaluation complexity bound of $\mathcal{O}(ε^{-\frac{1}{p}})$ is established. To the best of our knowledge, this is the first global rate analysis for an adaptive regularization algorithm with a tractable high-order sub-problem in nonconvex smooth optimization, opening the way for further improvements.

Global Convergence of High-Order Regularization Methods with Sums-of-Squares Taylor Models

TL;DR

This work develops an adaptive-regularization framework that embeds a Sum-of-Squares (SoS) Taylor model into high-order (

) optimization for nonconvex smooth functions. Each iteration minimizes a tractable SoS-based polynomial model via semidefinite programming, with a data-dependent perturbation

ensuring well-conditioned subproblems. The authors prove global convergence and establish worst-case evaluation complexities:

in general nonconvex settings, and improved rates

for odd

for even

when

is strongly convex. The results rely on uniform upper bounds for regularization parameters and detailed analysis of function-value decrease in successful iterations. This work provides the first global-rate analysis for a tractable high-order subproblem in nonconvex optimization and sets the stage for further refinements and practical SDP-based implementations.

Abstract

High-order tensor methods that employ Taylor-based local models (of degree

) within adaptive regularization frameworks have been recently proposed for both convex and nonconvex optimization problems. They have been shown to have superior, and even optimal, worst-case global convergence rates and local rates compared to Newton's method. Finding rigorous and efficient techniques for minimizing the Taylor polynomial sub-problems remains a challenging aspect for these algorithms. Ahmadi et al. recently introduced a tensor method based on sum-of-squares (SoS) reformulations, so that each Taylor polynomial sub-problem in their approach can be tractably minimized using semidefinite programming (SDP); however, the global convergence and complexity of their method have not been addressed for general nonconvex problems. This paper introduces an algorithmic framework that combines the Sum of Squares (SoS) Taylor model with adaptive regularization techniques for nonconvex smooth optimization problems. Each iteration minimizes an SoS Taylor model, offering a polynomial cost per iteration. For general nonconvex functions, the worst-case evaluation complexity bound is

, while for strongly convex functions, an improved evaluation complexity bound of

is established. To the best of our knowledge, this is the first global rate analysis for an adaptive regularization algorithm with a tractable high-order sub-problem in nonconvex smooth optimization, opening the way for further improvements.

Paper Structure (15 sections, 20 theorems, 84 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 15 sections, 20 theorems, 84 equations, 1 figure, 1 table, 1 algorithm.

Introduction
The SoS Taylor Model with Adaptive Regularization
An Upper Bound on the Regularization Parameter
A Uniform Upper Bound on $\bar{\sigma}_k$
A Uniform Upper Bound on $\sigma^r_k$
Convergence and Complexity Analysis
Bounding Objective Decrease in Successful Iterations
Improved Function Value Decrease for $3 \le p \le 5$ in Locally Convex Iterations
Proof for Improved Function Value Decrease for $p=4,5$ in Locally Convex Iterations
Overall Complexity Bound
Improved Complexity for Strongly Convex Functions
Numerical Illustration of the Theoretical Bound
Conclusion
Proof of Lemma \ref{['lemma norm']}
A Discussion for $a > {\frac{1}{2}}$

Key Result

Lemma 3.1

(Related norms) Under Assumption assumption bounded hessian, the following statements hold.

Figures (1)

Figure 1: Numerical experiment conducted with $p=3$ and $n=2$. Here, $g_k \in \mathbb{R}^2$ and symmetric $H_k \in \mathbb{R}^{2 \times 2}$ contain randomly generated entries from $\mathcal{N}(0, 1)$. Left: Fix $\delta = \mathcal{O}(1)$ and generate symmetric ${\cal T}_k \in \mathbb{R}^{2^3}$ with randomly generated entries, where the absolute maximum entries of ${\cal T}_k$ vary from $1$ to $10^3$. Right: Fix symmetric ${\cal T}_k \in \mathbb{R}^{2^3}$ containing randomly generated entries from $\mathcal{N}(0, 1)$ and vary $\delta$ from $10^{-3}$ to $10^0$.

Theorems & Definitions (54)

Remark 2.1
Definition 2.1
Definition 2.2
Definition 2.3
Remark 2.2
Remark 2.3
Remark 2.4
Definition 3.1
Definition 3.2
Lemma 3.1
...and 44 more

Global Convergence of High-Order Regularization Methods with Sums-of-Squares Taylor Models

TL;DR

Abstract

Global Convergence of High-Order Regularization Methods with Sums-of-Squares Taylor Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (54)