Technical results on the convergence of quasi-Newton methods for nonsmooth optimization

Bennet Gebken

Technical results on the convergence of quasi-Newton methods for nonsmooth optimization

Bennet Gebken

TL;DR

This work addresses the gap between the practical success of quasi-Newton methods, notably $\text{BFGS}$, in nonsmooth optimization and the lack of solid convergence theory. It reframes the problem by studying what eigenvalue behavior of the quasi-Newton matrices suffices for convergence on piecewise differentiable functions, rather than proving full spectral properties. The authors establish two main results under plausible behavioral assumptions: (i) under Behavior $\text{B1}$, the limit point $\bar{x}$ is Clarke critical; (ii) near the minimum, small eigenvalues persist in a controlled way so that the first $m$ iterations visit all smooth pieces, revealing the function's piecewise structure. These insights explain how secant information can encode nonsmooth information into the quasi-Newton matrix and how restarts influence structure learning, offering a potential avenue to integrate quasi-Newton updates with bundle or gradient-sampling approaches without solving subproblems.

Abstract

It is well-known by now that the BFGS method is an effective method for minimizing nonsmooth functions. However, despite its popularity, theoretical convergence results are almost non-existent. One of the difficulties when analyzing the nonsmooth case is the fact that the secant equation forces certain eigenvalues of the quasi-Newton matrix to vanish, which is a behavior that has not yet been fully analyzed. In this article, we show what kind of behavior of the eigenvalues would be sufficient to be able to prove the convergence for piecewise differentiable functions. More precisely, we derive assumptions on the behavior from numerical experiments and then prove criticality of the limit under these assumptions. Furthermore, we show how quasi-Newton methods are able to explore the piecewise structure. While we do not prove that the observed behavior of the eigenvalues actually occurs, we believe that these results still give insight, and a certain intuition, for the convergence for nonsmooth functions.

Technical results on the convergence of quasi-Newton methods for nonsmooth optimization

TL;DR

This work addresses the gap between the practical success of quasi-Newton methods, notably

, in nonsmooth optimization and the lack of solid convergence theory. It reframes the problem by studying what eigenvalue behavior of the quasi-Newton matrices suffices for convergence on piecewise differentiable functions, rather than proving full spectral properties. The authors establish two main results under plausible behavioral assumptions: (i) under Behavior

, the limit point

is Clarke critical; (ii) near the minimum, small eigenvalues persist in a controlled way so that the first

iterations visit all smooth pieces, revealing the function's piecewise structure. These insights explain how secant information can encode nonsmooth information into the quasi-Newton matrix and how restarts influence structure learning, offering a potential avenue to integrate quasi-Newton updates with bundle or gradient-sampling approaches without solving subproblems.

Technical results on the convergence of quasi-Newton methods for nonsmooth optimization

TL;DR

Abstract

Technical results on the convergence of quasi-Newton methods for nonsmooth optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (20)