Examples of slow convergence for adaptive regularization optimization methods are not isolated

Philippe L. Toint

Examples of slow convergence for adaptive regularization optimization methods are not isolated

Philippe L. Toint

TL;DR

The paper analyzes adaptive regularization methods for unconstrained nonconvex optimization, focusing on the AR2 scheme and its worst-case evaluation complexity $\mathcal{O}\left(\epsilon^{\frac{3}{3-q}}\right)$ to obtain an $\epsilon$-approximate $q$-order critical point for $q\in\{1,2\}$. It extends existing sharpness results by constructing a parametric family of one-dimensional, twice continuously differentiable piecewise-polynomial functions $f_{\mathcal{A},\mathcal{B}}$ with Lipschitz Hessian that interpolate AR2 data and exhibit slow convergence when $p=2$, yielding $k_\epsilon=\left\lceil\epsilon^{-\frac{3}{3-q}}\right\rceil$ iterations. The key contribution is showing that such slow-convergence instances are not isolated but occupy a set of nonzero measure in function space, enabled by flexible interpolation perturbations $\mathcal{A},\mathcal{B}$; the construction aligns with existing complexity sharpness results while clarifying distinctions with related asymptotic findings. Overall, the work broadens the understanding of worst-case behavior for adaptive regularization methods, highlighting a rich structure of slow convergence that has implications for practical expectations in nonconvex optimization.

Abstract

The adaptive regularization algorithm for unconstrained nonconvex optimization was shown in Nesterov and Polyak (2006) and Cartis, Gould and Toint (2011) to require, under standard assumptions, at most $\mathcal{O}(ε^{3/(3-q)})$ evaluations of the objective function and its derivatives of degrees one and two to produce an $ε$-approximate critical point of order $q\in\{1,2\}$. This bound was shown to be sharp for $q \in\{1,2\}$. This note revisits these results and shows that the example for which slow convergence is exhibited is not isolated, but that this behaviour occurs for a subset of univariate functions of nonzero measure.

Examples of slow convergence for adaptive regularization optimization methods are not isolated

TL;DR

The paper analyzes adaptive regularization methods for unconstrained nonconvex optimization, focusing on the AR2 scheme and its worst-case evaluation complexity

to obtain an

-approximate

-order critical point for

. It extends existing sharpness results by constructing a parametric family of one-dimensional, twice continuously differentiable piecewise-polynomial functions

with Lipschitz Hessian that interpolate AR2 data and exhibit slow convergence when

, yielding

iterations. The key contribution is showing that such slow-convergence instances are not isolated but occupy a set of nonzero measure in function space, enabled by flexible interpolation perturbations

; the construction aligns with existing complexity sharpness results while clarifying distinctions with related asymptotic findings. Overall, the work broadens the understanding of worst-case behavior for adaptive regularization methods, highlighting a rich structure of slow convergence that has implications for practical expectations in nonconvex optimization.

Abstract

evaluations of the objective function and its derivatives of degrees one and two to produce an

-approximate critical point of order

. This bound was shown to be sharp for

. This note revisits these results and shows that the example for which slow convergence is exhibited is not isolated, but that this behaviour occurs for a subset of univariate functions of nonzero measure.

Paper Structure (4 sections, 1 theorem, 54 equations, 1 figure)

This paper contains 4 sections, 1 theorem, 54 equations, 1 figure.

Introduction
The context
The example
Discussion

Key Result

Theorem 2.1

Under the assumptions on $f$ stated at the beginning of this section and given a criticality order $q\in\{1,2\}$, the AR2 algorithm requires at most evaluations of $f$, and its derivatives of orders one and two to produce an iterate $x_\epsilon$ such that $\phi_{f,j}(x_\epsilon)\leq \epsilon_j /j$ for $j\in\{1,2\}$.

Figures (1)

Figure 1: A few members of the set of functions causing slow convergence of the adaptive regularization algorithm ($\epsilon = 10^{-5}$, $q=1$, showing the first 15 iterations)

Theorems & Definitions (1)

Theorem 2.1

Examples of slow convergence for adaptive regularization optimization methods are not isolated

TL;DR

Abstract

Examples of slow convergence for adaptive regularization optimization methods are not isolated

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (1)