Second-order methods for quartically-regularised cubic polynomials, with applications to high-order tensor methods

Coralia Cartis; Wenqi Zhu

Second-order methods for quartically-regularised cubic polynomials, with applications to high-order tensor methods

Coralia Cartis, Wenqi Zhu

TL;DR

This work develops the Quadratic Quartic Regularization (QQR) framework to efficiently minimize nonconvex quartically-regularised cubic polynomials arising in high-order tensor methods. By approximating the AR$3$ subproblem with a sequence of (possibly nonconvex) quadratic models augmented by quartic regularization, QQR achieves favorable complexity, including linear convergence in locally convex regions and robust progress in nonconvex settings. Two practical variants are analyzed: QQR-v1 with a single adaptive parameter and QQR-v2 with two adaptive parameters, each providing provable bounds and descent properties and connecting to Nesterov's linear convergence results in the convex case. Numerical experiments show QQR variants competitive with state-of-the-art ARC/AR$p$ methods, often reducing function evaluations and obtaining lower minima, particularly in tensor-dominant or ill-conditioned scenarios. Overall, QQR advances practical high-order optimization by delivering theoretically grounded, efficient solvers tailored to the AR$3$ subproblem.

Abstract

There has been growing interest in high-order tensor methods for nonconvex optimization, with adaptive regularization, as they possess better/optimal worst-case evaluation complexity globally and faster convergence asymptotically. These algorithms crucially rely on repeatedly minimizing nonconvex multivariate Taylor-based polynomial sub-problems, at least locally. Finding efficient techniques for the solution of these sub-problems, beyond the second-order case, has been an open question. This paper proposes a second-order method, Quadratic Quartic Regularisation (QQR), for efficiently minimizing nonconvex quartically-regularized cubic polynomials, such as the AR$p$ sub-problem [3] with $p=3$. Inspired by [35], QQR approximates the third-order tensor term by a linear combination of quadratic and quartic terms, yielding (possibly nonconvex) local models that are solvable to global optimality. In order to achieve accuracy $ε$ in the first-order criticality of the sub-problem in finitely many iterations, we show that the error in the QQR method decreases either linearly or by at least $\mathcal{O}(ε^{4/3})$ for locally convex iterations, while in the nonconvex case, by at least $\mathcal{O}(ε)$; thus improving, on these types of iterations, the general cubic-regularization bound. Preliminary numerical experiments indicate that two QQR variants perform competitively with state-of-the-art approaches such as ARC (also known as AR$p$ with $p=2$), achieving either a lower objective value or iteration counts.

Second-order methods for quartically-regularised cubic polynomials, with applications to high-order tensor methods

TL;DR

subproblem with a sequence of (possibly nonconvex) quadratic models augmented by quartic regularization, QQR achieves favorable complexity, including linear convergence in locally convex regions and robust progress in nonconvex settings. Two practical variants are analyzed: QQR-v1 with a single adaptive parameter and QQR-v2 with two adaptive parameters, each providing provable bounds and descent properties and connecting to Nesterov's linear convergence results in the convex case. Numerical experiments show QQR variants competitive with state-of-the-art ARC/AR

methods, often reducing function evaluations and obtaining lower minima, particularly in tensor-dominant or ill-conditioned scenarios. Overall, QQR advances practical high-order optimization by delivering theoretically grounded, efficient solvers tailored to the AR

subproblem.

Abstract

sub-problem [3] with

. Inspired by [35], QQR approximates the third-order tensor term by a linear combination of quadratic and quartic terms, yielding (possibly nonconvex) local models that are solvable to global optimality. In order to achieve accuracy

in the first-order criticality of the sub-problem in finitely many iterations, we show that the error in the QQR method decreases either linearly or by at least

for locally convex iterations, while in the nonconvex case, by at least

; thus improving, on these types of iterations, the general cubic-regularization bound. Preliminary numerical experiments indicate that two QQR variants perform competitively with state-of-the-art approaches such as ARC (also known as AR

with

), achieving either a lower objective value or iteration counts.

Paper Structure (26 sections, 19 theorems, 88 equations, 4 figures, 9 tables, 3 algorithms)

This paper contains 26 sections, 19 theorems, 88 equations, 4 figures, 9 tables, 3 algorithms.

Introduction and Problem Set-up
Minimizing the Quartically-regularised Cubic Polynomial $m_3$
Related Work Regarding Minimization of Quartic Polynomials
Motivation for Our Work
The QQR Method
QQR Variant 1: Single Adaptive Model Parameter
Upper and Lower Bounds in $\mathcal{D}^{(i)}(q)$
Linear Convergence of QQR-v1 in $\mathcal{D}^{(i)}(q)$
QQR Variant 2: Two Adaptive Model Parameters
Bound for Locally Strictly Convex and Nearly Convex Iterations
Bound for Locally Nonconvex Iterations
Complexity and Global Convergence of QQR
Complexity Bound for Iterations with $\lambda_{\min}[H_i] \le -\lambda_c$
Complexity Bound for Iterations with $\lambda_{\min}[H_i] \ge \lambda_c$
Overall Complexity Bound
...and 11 more sections

Key Result

Theorem 1.1

(Theorem 8.2.8 cartis2022evaluation) Let $r \ge 3$, any global minimizer of $m_{ 2}^r(s)$, $s_*^r$, satisfy $(\hat{H} + \hat{\lambda}_* I_n) s_*^r = -\hat{g},$ where $I_n \in \mathbb{R}^{n \times n}$ is the identity matrix, $\hat{\lambda}_* \ge 0$ , $\hat{H} + \hat{\lambda}_* I_n \succeq 0$, and $\h

Figures (4)

Figure 1: Minimizing $m_3(s) = 10s-50s^2+5s^3+5s^4$. $m_3$, $M_{\alpha^+}$, $M_{\alpha^-}$ plotted in black, green, and red respectively. $M_{\alpha^-}$ is bounded below on each iteration but the lower bound is outside the range of the plots.
Figure 2: The parameters are a$=80$, b$=80$, c$=80$, and $\sigma =80$, $n=100$.
Figure 3: Performance profile plots for the three methods. $\tau_m \in [1,10]$ is used for iteration and evaluation counts, while $\tau_m \in [1,10^3]$ is used for CPU time. A zoomed-in plot for $\tau_m \in [1,3]$ for iteration and evaluation counts, and $\tau_m \in [1,100]$ for CPU time, is provided in Appendix \ref{['Appendix numerical']}, Figure \ref{['fig performance plot appendix']}.
Figure 4: Performance profile plots for three methods; $\tau_m \in [1,3]$ for iteration and evaluation counts and $\tau_m \in [1,100]$ for CPU time.

Theorems & Definitions (59)

Theorem 1.1
Remark 1.1
Example 1.2
Remark 2.1
Remark 2.2
Definition 2.1
Lemma 2.1
proof
Remark 2.3
Theorem 2.2
...and 49 more

Second-order methods for quartically-regularised cubic polynomials, with applications to high-order tensor methods

TL;DR

Abstract

Second-order methods for quartically-regularised cubic polynomials, with applications to high-order tensor methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (59)