Table of Contents
Fetching ...

Superlinear convergence in nonsmooth optimization via higher-order cutting-plane models

Bennet Gebken, Michael Ulbrich

Abstract

A cutting-plane model for a nonsmooth function is the maximum of several first-order expansions centered at different points. Using such a model in a bundle method leads to linear convergence (of serious steps) to a minimum. In smooth optimization, superlinear convergence can be achieved by using higher-order models. We show that the same is true for the nonsmooth case, i.e., we show that cutting-plane models involving higher-order expansions can be used to achieve superlinear convergence in nonsmooth optimization. We first formally define higher-order cutting-plane models for lower-$C^2$ functions and derive an error estimate. Afterwards, we construct a trust-region bundle method based on these models that achieves local superlinear convergence of serious steps, and overall superlinear convergence for certain finite max-type functions. Finally, we verify the superlinear convergence in numerical experiments.

Superlinear convergence in nonsmooth optimization via higher-order cutting-plane models

Abstract

A cutting-plane model for a nonsmooth function is the maximum of several first-order expansions centered at different points. Using such a model in a bundle method leads to linear convergence (of serious steps) to a minimum. In smooth optimization, superlinear convergence can be achieved by using higher-order models. We show that the same is true for the nonsmooth case, i.e., we show that cutting-plane models involving higher-order expansions can be used to achieve superlinear convergence in nonsmooth optimization. We first formally define higher-order cutting-plane models for lower- functions and derive an error estimate. Afterwards, we construct a trust-region bundle method based on these models that achieves local superlinear convergence of serious steps, and overall superlinear convergence for certain finite max-type functions. Finally, we verify the superlinear convergence in numerical experiments.
Paper Structure (9 sections, 8 theorems, 49 equations, 6 figures, 3 algorithms)

This paper contains 9 sections, 8 theorems, 49 equations, 6 figures, 3 algorithms.

Key Result

Lemma 3.1

Let $q \in \mathbb{N}$ and assume that $f : U \rightarrow \mathbb{R}$ satisfies assum:A1. Then for every bounded set $V \subseteq U$ and every $\varepsilon_{max} > 0$ with $\mathop{\mathrm{cl}}\limits(V + \bar{B}_{\varepsilon_{max}}(0)) \subseteq U$, there is some $K \geq 0$ such that for all $x \in V$, $\varepsilon \in [0,\varepsilon_{max}]$, and finite, nonempty sets $W \subseteq \bar{B}_\varep

Figures (6)

  • Figure 1: Higher-order cutting-plane models (red) for the nonconvex function $f : \mathbb{R} \rightarrow \mathbb{R}$, $x \mapsto \max(\{ -(x + 0.5)^2 + 0.25 |x|^{3/2} + 0.5, x^2 + 0.5 |x|^{3/2} - 0.25, -1/(|x| + 0.25) + 2 \})$ for different orders $q$ of Taylor expansions (dashed) and the centers $W = \{ -1.2, -0.9, -0.3, 0.75, 1.25 \}$ (dots). The different colors for the Taylor expansions correspond to different centers.
  • Figure 2: (a) The distance $\| x^j - x^* \|$ (black) for sequences $(x^j)_j$ generated by Alg. \ref{['algo:local_method']} with varying order $q$ of Taylor expansion in Ex. \ref{['example:1d_symbolic']}. The red, dotted lines show, depending on the marker, the corresponding upper bound $(\varepsilon_j)_j$ from Thm. \ref{['thm:local_method_convergence']}. (b) The distance $\| x^j - x^* \|$ in Ex. \ref{['example:LW2019_85']} and the corresponding sequence $(\varepsilon_j)_j$.
  • Figure 3: (a) The number of oracle calls required by Alg. \ref{['algo:approx_W']} in each iteration of Alg. \ref{['algo:local_method']} in Ex. \ref{['example:LW2019_85']}. (b) The distance $\| x^{j(l)} - x^* \|$ with $(x^{j(l)})_l$ as in Cor. \ref{['cor:N_step_convergence']}.
  • Figure 4: (a) The distance $\| x^j - \tilde{x}^* \|$ in Ex. \ref{['example:LW2019_eigval']} and the corresponding sequence $(\varepsilon_j)_j$. (b) The distance of the objective values to the reference value $f(\tilde{x}^*)$ with respect to oracle calls for Alg. \ref{['algo:local_method']} and HANSO.
  • Figure 5: (a) The distance $\| x^j - x^* \|$ in Ex. \ref{['example:halfhalf']} and the corresponding sequence $(\varepsilon_j)_j$. (b) The distance of the objective values to the optimal value $f(x^*)$ with respect to oracle calls for Alg. \ref{['algo:local_method']} and VUbundle. (The zoom on the result of Alg. \ref{['algo:local_method']} shows that it is not a descent method.)
  • ...and 1 more figures

Theorems & Definitions (23)

  • Definition 2.1
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Lemma 4.1
  • proof
  • Remark 4.2
  • Lemma 4.3
  • proof
  • ...and 13 more