Table of Contents
Fetching ...

Examples of slow convergence for adaptive regularization optimization methods are not isolated

Philippe L. Toint

TL;DR

The paper analyzes adaptive regularization methods for unconstrained nonconvex optimization, focusing on the AR2 scheme and its worst-case evaluation complexity $\mathcal{O}\left(\epsilon^{\frac{3}{3-q}}\right)$ to obtain an $\epsilon$-approximate $q$-order critical point for $q\in\{1,2\}$. It extends existing sharpness results by constructing a parametric family of one-dimensional, twice continuously differentiable piecewise-polynomial functions $f_{\mathcal{A},\mathcal{B}}$ with Lipschitz Hessian that interpolate AR2 data and exhibit slow convergence when $p=2$, yielding $k_\epsilon=\left\lceil\epsilon^{-\frac{3}{3-q}}\right\rceil$ iterations. The key contribution is showing that such slow-convergence instances are not isolated but occupy a set of nonzero measure in function space, enabled by flexible interpolation perturbations $\mathcal{A},\mathcal{B}$; the construction aligns with existing complexity sharpness results while clarifying distinctions with related asymptotic findings. Overall, the work broadens the understanding of worst-case behavior for adaptive regularization methods, highlighting a rich structure of slow convergence that has implications for practical expectations in nonconvex optimization.

Abstract

The adaptive regularization algorithm for unconstrained nonconvex optimization was shown in Nesterov and Polyak (2006) and Cartis, Gould and Toint (2011) to require, under standard assumptions, at most $\mathcal{O}(ε^{3/(3-q)})$ evaluations of the objective function and its derivatives of degrees one and two to produce an $ε$-approximate critical point of order $q\in\{1,2\}$. This bound was shown to be sharp for $q \in\{1,2\}$. This note revisits these results and shows that the example for which slow convergence is exhibited is not isolated, but that this behaviour occurs for a subset of univariate functions of nonzero measure.

Examples of slow convergence for adaptive regularization optimization methods are not isolated

TL;DR

The paper analyzes adaptive regularization methods for unconstrained nonconvex optimization, focusing on the AR2 scheme and its worst-case evaluation complexity to obtain an -approximate -order critical point for . It extends existing sharpness results by constructing a parametric family of one-dimensional, twice continuously differentiable piecewise-polynomial functions with Lipschitz Hessian that interpolate AR2 data and exhibit slow convergence when , yielding iterations. The key contribution is showing that such slow-convergence instances are not isolated but occupy a set of nonzero measure in function space, enabled by flexible interpolation perturbations ; the construction aligns with existing complexity sharpness results while clarifying distinctions with related asymptotic findings. Overall, the work broadens the understanding of worst-case behavior for adaptive regularization methods, highlighting a rich structure of slow convergence that has implications for practical expectations in nonconvex optimization.

Abstract

The adaptive regularization algorithm for unconstrained nonconvex optimization was shown in Nesterov and Polyak (2006) and Cartis, Gould and Toint (2011) to require, under standard assumptions, at most evaluations of the objective function and its derivatives of degrees one and two to produce an -approximate critical point of order . This bound was shown to be sharp for . This note revisits these results and shows that the example for which slow convergence is exhibited is not isolated, but that this behaviour occurs for a subset of univariate functions of nonzero measure.
Paper Structure (4 sections, 1 theorem, 54 equations, 1 figure)

This paper contains 4 sections, 1 theorem, 54 equations, 1 figure.

Key Result

Theorem 2.1

Under the assumptions on $f$ stated at the beginning of this section and given a criticality order $q\in\{1,2\}$, the AR2 algorithm requires at most evaluations of $f$, and its derivatives of orders one and two to produce an iterate $x_\epsilon$ such that $\phi_{f,j}(x_\epsilon)\leq \epsilon_j /j$ for $j\in\{1,2\}$.

Figures (1)

  • Figure 1: A few members of the set of functions causing slow convergence of the adaptive regularization algorithm ($\epsilon = 10^{-5}$, $q=1$, showing the first 15 iterations)

Theorems & Definitions (1)

  • Theorem 2.1