Using second-order information in gradient sampling methods for nonsmooth optimization

Bennet Gebken

Using second-order information in gradient sampling methods for nonsmooth optimization

Bennet Gebken

TL;DR

The paper tackles nonsmooth optimization by introducing the second-order ε-jet, a second-order analogue of the Goldstein ε-subdifferential, to build a local model $\mathcal{T}_{x,\varepsilon}$ as the maximum of second-order Taylor expansions within the ε-ball $B_\varepsilon(x)$. It proves the existence and favorable properties of the jet $\mathcal{J}_\varepsilon^2 f(x)$ and derives error bounds $R_{x,\varepsilon}(z)$ that decrease with higher-order information, yielding cubic/quadric error rates for max-type and convex functions, respectively. A descent method is developed that minimizes the model on a trust-region-like ball, with an abstract version requiring the full jet and a practical version that approximates the jet via a deterministic sampling scheme; convergence is established for convex or max-type objectives. Numerical experiments (SOGS) show strong performance with respect to oracle calls on general nonsmooth problems and offer a practical counterpart to superlinear solvers on special problem classes, albeit with higher per-iteration cost due to solving a QCQP. The work provides a theoretical framework and computational approach for leveraging second-order information in nonsmooth optimization, with open directions for improving subproblem efficiency and integrating quasi-Newton ideas.

Abstract

In this article, we introduce a novel concept for second-order information of a nonsmooth function inspired by the Goldstein eps-subdifferential. It comprises the coefficients of all existing second-order Taylor expansions in an eps-ball around a given point. Based on this concept, we define a model of the objective as the maximum of these Taylor expansions, and derive a sampling scheme for its approximation in practice. Minimization of this model induces a simple descent method, for which we show convergence for the case where the objective is convex or of max-type. While we do not prove any rate of convergence of this method, numerical experiments suggest superlinear behavior with respect to the number of oracle calls of the objective.

Using second-order information in gradient sampling methods for nonsmooth optimization

TL;DR

The paper tackles nonsmooth optimization by introducing the second-order ε-jet, a second-order analogue of the Goldstein ε-subdifferential, to build a local model

as the maximum of second-order Taylor expansions within the ε-ball

. It proves the existence and favorable properties of the jet

and derives error bounds

that decrease with higher-order information, yielding cubic/quadric error rates for max-type and convex functions, respectively. A descent method is developed that minimizes the model on a trust-region-like ball, with an abstract version requiring the full jet and a practical version that approximates the jet via a deterministic sampling scheme; convergence is established for convex or max-type objectives. Numerical experiments (SOGS) show strong performance with respect to oracle calls on general nonsmooth problems and offer a practical counterpart to superlinear solvers on special problem classes, albeit with higher per-iteration cost due to solving a QCQP. The work provides a theoretical framework and computational approach for leveraging second-order information in nonsmooth optimization, with open directions for improving subproblem efficiency and integrating quasi-Newton ideas.

Abstract

Paper Structure (13 sections, 13 theorems, 65 equations, 2 figures, 3 tables, 3 algorithms)

This paper contains 13 sections, 13 theorems, 65 equations, 2 figures, 3 tables, 3 algorithms.

Introduction
Preliminaries
Second-order epsilon-jet and model
Second-order epsilon-jet
Second-order model
Descent method
Abstract algorithm
Approximating the epsilon-jet
Practical algorithm
Numerical experiments
Performance on popular test problems
Comparison to superlinear solvers for special problem classes
Conclusion

Key Result

Lemma 1

Assume that $f$ satisfies Assumption assum:A1. Then $\mathcal{J}_\varepsilon^2 f(x)$ is nonempty and compact for all $x \in \mathbb{R}^n$ and $\varepsilon \geq 0$.

Figures (2)

Figure 1: (a) The graph of $f$ in Example \ref{['example:semismooth']}. (b) The sequences $(x^j)_j$ and $(\bar{z}(x^j,\varepsilon_j))_j$ generated by Algo. \ref{['algo:abstract_descent_method']} and \ref{['algo:practical_descent_method']}.
Figure 2: (a) The dots represent the iterates of SOGS (i.e., $(\hat{x}^l)_l$) and VUbundle for Problem \ref{['eq:halfhalf']}. The horizontal axis shows the number of $\partial f$ evaluations required up to each iterate and the vertical axis shows the distance (in logarithmic scale) of the objective value to the minimal value at each iterate. For SOGS, the red dots highlight the subsequence $(x^j)_j$ among $(\hat{x}^l)_l$, cf. \ref{['eq:def_xj']}, \ref{['eq:def_xhatl']}. (b) Same as (a) for the solver SuperPolyak and Problem \ref{['eq:max_root']}. (Not shown is the final iterate of SuperPolyak, which took 1274 subgradient evaluations and reached the optimal value up to machine precision.)

Theorems & Definitions (33)

Definition 1
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Lemma 4
proof
Remark 1
...and 23 more

Using second-order information in gradient sampling methods for nonsmooth optimization

TL;DR

Abstract

Using second-order information in gradient sampling methods for nonsmooth optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (33)