Table of Contents
Fetching ...

Technical results on the convergence of quasi-Newton methods for nonsmooth optimization

Bennet Gebken

TL;DR

This work addresses the gap between the practical success of quasi-Newton methods, notably $\text{BFGS}$, in nonsmooth optimization and the lack of solid convergence theory. It reframes the problem by studying what eigenvalue behavior of the quasi-Newton matrices suffices for convergence on piecewise differentiable functions, rather than proving full spectral properties. The authors establish two main results under plausible behavioral assumptions: (i) under Behavior $\text{B1}$, the limit point $\bar{x}$ is Clarke critical; (ii) near the minimum, small eigenvalues persist in a controlled way so that the first $m$ iterations visit all smooth pieces, revealing the function's piecewise structure. These insights explain how secant information can encode nonsmooth information into the quasi-Newton matrix and how restarts influence structure learning, offering a potential avenue to integrate quasi-Newton updates with bundle or gradient-sampling approaches without solving subproblems.

Abstract

It is well-known by now that the BFGS method is an effective method for minimizing nonsmooth functions. However, despite its popularity, theoretical convergence results are almost non-existent. One of the difficulties when analyzing the nonsmooth case is the fact that the secant equation forces certain eigenvalues of the quasi-Newton matrix to vanish, which is a behavior that has not yet been fully analyzed. In this article, we show what kind of behavior of the eigenvalues would be sufficient to be able to prove the convergence for piecewise differentiable functions. More precisely, we derive assumptions on the behavior from numerical experiments and then prove criticality of the limit under these assumptions. Furthermore, we show how quasi-Newton methods are able to explore the piecewise structure. While we do not prove that the observed behavior of the eigenvalues actually occurs, we believe that these results still give insight, and a certain intuition, for the convergence for nonsmooth functions.

Technical results on the convergence of quasi-Newton methods for nonsmooth optimization

TL;DR

This work addresses the gap between the practical success of quasi-Newton methods, notably , in nonsmooth optimization and the lack of solid convergence theory. It reframes the problem by studying what eigenvalue behavior of the quasi-Newton matrices suffices for convergence on piecewise differentiable functions, rather than proving full spectral properties. The authors establish two main results under plausible behavioral assumptions: (i) under Behavior , the limit point is Clarke critical; (ii) near the minimum, small eigenvalues persist in a controlled way so that the first iterations visit all smooth pieces, revealing the function's piecewise structure. These insights explain how secant information can encode nonsmooth information into the quasi-Newton matrix and how restarts influence structure learning, offering a potential avenue to integrate quasi-Newton updates with bundle or gradient-sampling approaches without solving subproblems.

Abstract

It is well-known by now that the BFGS method is an effective method for minimizing nonsmooth functions. However, despite its popularity, theoretical convergence results are almost non-existent. One of the difficulties when analyzing the nonsmooth case is the fact that the secant equation forces certain eigenvalues of the quasi-Newton matrix to vanish, which is a behavior that has not yet been fully analyzed. In this article, we show what kind of behavior of the eigenvalues would be sufficient to be able to prove the convergence for piecewise differentiable functions. More precisely, we derive assumptions on the behavior from numerical experiments and then prove criticality of the limit under these assumptions. Furthermore, we show how quasi-Newton methods are able to explore the piecewise structure. While we do not prove that the observed behavior of the eigenvalues actually occurs, we believe that these results still give insight, and a certain intuition, for the convergence for nonsmooth functions.

Paper Structure

This paper contains 7 sections, 8 theorems, 34 equations, 3 figures, 1 algorithm.

Key Result

Lemma 1

Assume that $f$ satisfies assum:A1 and that behav:B1 holds. Then

Figures (3)

  • Figure 1: (a) The distance of $f(x^k)$ to $f(x^*)$ in Example \ref{['example:criticality']}. Each color corresponds to one run of the BFGS method for one problem instance. (b) The eigenvalues $\lambda_5^k$, $\lambda_6^k$, and $\lambda_{10}^k$ of $H_k$ for each run. (c) The value \ref{['eq:secant_memory_numerically']} for each run, with the same coloring as in (a).
  • Figure 2: (a) The relevant eigenvalues of $(H_k)_k$ in the first $m-1$ iterations in Example \ref{['example:exploration']} for initial points closer and closer to $x^*$. (Note that the dots are all exactly on the value $10^0 = 1$.) (b) The value \ref{['eq:theta_numerically']} for the same initial points.
  • Figure 3: (a) The distance of $f(x^k)$ to $f(x^*)$ in Example \ref{['example:restarts']}, with restarts every $m$ (black) or $m-1$ (red) iterations. (b) The number of unique selection functions encountered between restarts. (c) The step lengths $(t_k)_k$.

Theorems & Definitions (20)

  • Example 1
  • Remark 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • ...and 10 more