Using second-order information in gradient sampling methods for nonsmooth optimization
Bennet Gebken
TL;DR
The paper tackles nonsmooth optimization by introducing the second-order ε-jet, a second-order analogue of the Goldstein ε-subdifferential, to build a local model $\mathcal{T}_{x,\varepsilon}$ as the maximum of second-order Taylor expansions within the ε-ball $B_\varepsilon(x)$. It proves the existence and favorable properties of the jet $\mathcal{J}_\varepsilon^2 f(x)$ and derives error bounds $R_{x,\varepsilon}(z)$ that decrease with higher-order information, yielding cubic/quadric error rates for max-type and convex functions, respectively. A descent method is developed that minimizes the model on a trust-region-like ball, with an abstract version requiring the full jet and a practical version that approximates the jet via a deterministic sampling scheme; convergence is established for convex or max-type objectives. Numerical experiments (SOGS) show strong performance with respect to oracle calls on general nonsmooth problems and offer a practical counterpart to superlinear solvers on special problem classes, albeit with higher per-iteration cost due to solving a QCQP. The work provides a theoretical framework and computational approach for leveraging second-order information in nonsmooth optimization, with open directions for improving subproblem efficiency and integrating quasi-Newton ideas.
Abstract
In this article, we introduce a novel concept for second-order information of a nonsmooth function inspired by the Goldstein eps-subdifferential. It comprises the coefficients of all existing second-order Taylor expansions in an eps-ball around a given point. Based on this concept, we define a model of the objective as the maximum of these Taylor expansions, and derive a sampling scheme for its approximation in practice. Minimization of this model induces a simple descent method, for which we show convergence for the case where the objective is convex or of max-type. While we do not prove any rate of convergence of this method, numerical experiments suggest superlinear behavior with respect to the number of oracle calls of the objective.
