Table of Contents
Fetching ...

Second-order methods for quartically-regularised cubic polynomials, with applications to high-order tensor methods

Coralia Cartis, Wenqi Zhu

TL;DR

This work develops the Quadratic Quartic Regularization (QQR) framework to efficiently minimize nonconvex quartically-regularised cubic polynomials arising in high-order tensor methods. By approximating the AR$3$ subproblem with a sequence of (possibly nonconvex) quadratic models augmented by quartic regularization, QQR achieves favorable complexity, including linear convergence in locally convex regions and robust progress in nonconvex settings. Two practical variants are analyzed: QQR-v1 with a single adaptive parameter and QQR-v2 with two adaptive parameters, each providing provable bounds and descent properties and connecting to Nesterov's linear convergence results in the convex case. Numerical experiments show QQR variants competitive with state-of-the-art ARC/AR$p$ methods, often reducing function evaluations and obtaining lower minima, particularly in tensor-dominant or ill-conditioned scenarios. Overall, QQR advances practical high-order optimization by delivering theoretically grounded, efficient solvers tailored to the AR$3$ subproblem.

Abstract

There has been growing interest in high-order tensor methods for nonconvex optimization, with adaptive regularization, as they possess better/optimal worst-case evaluation complexity globally and faster convergence asymptotically. These algorithms crucially rely on repeatedly minimizing nonconvex multivariate Taylor-based polynomial sub-problems, at least locally. Finding efficient techniques for the solution of these sub-problems, beyond the second-order case, has been an open question. This paper proposes a second-order method, Quadratic Quartic Regularisation (QQR), for efficiently minimizing nonconvex quartically-regularized cubic polynomials, such as the AR$p$ sub-problem [3] with $p=3$. Inspired by [35], QQR approximates the third-order tensor term by a linear combination of quadratic and quartic terms, yielding (possibly nonconvex) local models that are solvable to global optimality. In order to achieve accuracy $ε$ in the first-order criticality of the sub-problem in finitely many iterations, we show that the error in the QQR method decreases either linearly or by at least $\mathcal{O}(ε^{4/3})$ for locally convex iterations, while in the nonconvex case, by at least $\mathcal{O}(ε)$; thus improving, on these types of iterations, the general cubic-regularization bound. Preliminary numerical experiments indicate that two QQR variants perform competitively with state-of-the-art approaches such as ARC (also known as AR$p$ with $p=2$), achieving either a lower objective value or iteration counts.

Second-order methods for quartically-regularised cubic polynomials, with applications to high-order tensor methods

TL;DR

This work develops the Quadratic Quartic Regularization (QQR) framework to efficiently minimize nonconvex quartically-regularised cubic polynomials arising in high-order tensor methods. By approximating the AR subproblem with a sequence of (possibly nonconvex) quadratic models augmented by quartic regularization, QQR achieves favorable complexity, including linear convergence in locally convex regions and robust progress in nonconvex settings. Two practical variants are analyzed: QQR-v1 with a single adaptive parameter and QQR-v2 with two adaptive parameters, each providing provable bounds and descent properties and connecting to Nesterov's linear convergence results in the convex case. Numerical experiments show QQR variants competitive with state-of-the-art ARC/AR methods, often reducing function evaluations and obtaining lower minima, particularly in tensor-dominant or ill-conditioned scenarios. Overall, QQR advances practical high-order optimization by delivering theoretically grounded, efficient solvers tailored to the AR subproblem.

Abstract

There has been growing interest in high-order tensor methods for nonconvex optimization, with adaptive regularization, as they possess better/optimal worst-case evaluation complexity globally and faster convergence asymptotically. These algorithms crucially rely on repeatedly minimizing nonconvex multivariate Taylor-based polynomial sub-problems, at least locally. Finding efficient techniques for the solution of these sub-problems, beyond the second-order case, has been an open question. This paper proposes a second-order method, Quadratic Quartic Regularisation (QQR), for efficiently minimizing nonconvex quartically-regularized cubic polynomials, such as the AR sub-problem [3] with . Inspired by [35], QQR approximates the third-order tensor term by a linear combination of quadratic and quartic terms, yielding (possibly nonconvex) local models that are solvable to global optimality. In order to achieve accuracy in the first-order criticality of the sub-problem in finitely many iterations, we show that the error in the QQR method decreases either linearly or by at least for locally convex iterations, while in the nonconvex case, by at least ; thus improving, on these types of iterations, the general cubic-regularization bound. Preliminary numerical experiments indicate that two QQR variants perform competitively with state-of-the-art approaches such as ARC (also known as AR with ), achieving either a lower objective value or iteration counts.
Paper Structure (26 sections, 19 theorems, 88 equations, 4 figures, 9 tables, 3 algorithms)

This paper contains 26 sections, 19 theorems, 88 equations, 4 figures, 9 tables, 3 algorithms.

Key Result

Theorem 1.1

(Theorem 8.2.8 cartis2022evaluation) Let $r \ge 3$, any global minimizer of $m_{ 2}^r(s)$, $s_*^r$, satisfy $(\hat{H} + \hat{\lambda}_* I_n) s_*^r = -\hat{g},$ where $I_n \in \mathbb{R}^{n \times n}$ is the identity matrix, $\hat{\lambda}_* \ge 0$ , $\hat{H} + \hat{\lambda}_* I_n \succeq 0$, and $\h

Figures (4)

  • Figure 1: Minimizing $m_3(s) = 10s-50s^2+5s^3+5s^4$. $m_3$, $M_{\alpha^+}$, $M_{\alpha^-}$ plotted in black, green, and red respectively. $M_{\alpha^-}$ is bounded below on each iteration but the lower bound is outside the range of the plots.
  • Figure 2: The parameters are a$=80$, b$=80$, c$=80$, and $\sigma =80$, $n=100$.
  • Figure 3: Performance profile plots for the three methods. $\tau_m \in [1,10]$ is used for iteration and evaluation counts, while $\tau_m \in [1,10^3]$ is used for CPU time. A zoomed-in plot for $\tau_m \in [1,3]$ for iteration and evaluation counts, and $\tau_m \in [1,100]$ for CPU time, is provided in Appendix \ref{['Appendix numerical']}, Figure \ref{['fig performance plot appendix']}.
  • Figure 4: Performance profile plots for three methods; $\tau_m \in [1,3]$ for iteration and evaluation counts and $\tau_m \in [1,100]$ for CPU time.

Theorems & Definitions (59)

  • Theorem 1.1
  • Remark 1.1
  • Example 1.2
  • Remark 2.1
  • Remark 2.2
  • Definition 2.1
  • Lemma 2.1
  • proof
  • Remark 2.3
  • Theorem 2.2
  • ...and 49 more