Safeguarding adaptive methods: global convergence of Barzilai-Borwein and other stepsize choices

Hongjia Ou; Andreas Themelis

Safeguarding adaptive methods: global convergence of Barzilai-Borwein and other stepsize choices

Hongjia Ou, Andreas Themelis

TL;DR

This paper provides a linesearch-free proximal gradient framework for globalizing the convergence of popular stepsize choices such as Barzilai-Borwein and one-dimensional Anderson acceleration and refines existing results upon which it builds.

Abstract

Leveraging on recent advancements on adaptive methods for convex minimization problems, this paper provides a linesearch-free proximal gradient framework for globalizing the convergence of popular stepsize choices such as Barzilai-Borwein and one-dimensional Anderson acceleration. This framework can cope with problems in which the gradient of the differentiable function is merely locally Hölder continuous. Our analysis not only encompasses but also refines existing results upon which it builds. The theory is corroborated by numerical evidence that showcases the synergetic interplay between fast stepsize selections and adaptive methods.

Safeguarding adaptive methods: global convergence of Barzilai-Borwein and other stepsize choices

TL;DR

Abstract

Paper Structure (16 sections, 3 theorems, 37 equations, 3 figures, 1 algorithm)

This paper contains 16 sections, 3 theorems, 37 equations, 3 figures, 1 algorithm.

Introduction
Contributions
Paper organization
Adaptive methods as safeguards
Motivating examples: Barzilai-Borwein stepsizes
The locally Hölder case
Convergence of adaptive methods revisited
A convergence recipe
Preliminary results on adaptive methods
Convergence analysis
Choice of stepsizes
Long and short Barzilai-Borwein
Martinez' rule for long and short BB
Least normalized secant error for long and short BB
Anderson acceleration
...and 1 more sections

Key Result

lemma 1

Let ass:basic hold with $\nu\in[0,1]$. Then, $\seq{x^k}$ and $\seq{\gamma_k}$ generated by adaPG$^{q,\frac{q}{2}}$ (eq:PG and eq:adaPG) satisfy prop:adaPG1prop:adaPG2prop:lammin with where $L_{{\Omega},\nu}$ is a $\nu$-Hölder modulus for $\nabla f$ on a compact and convex set $\Omega$ that contains all the iterates $x^k$.

Figures (3)

Figure 1: Random lasso problem with $\ell_1$-regularization parameter $\lambda=0.1$. $n_\star$ represents the number of nonzero elements in the solution.
Figure 2: Regularized logistic regression ($m$ and $n$ are the number of samples and features). The $\ell_1$-regularization parameter $\lambda$ is set as in zhou2024adabb.
Figure 3: Cubic regularization problem with Hessian and gradient generated from the logistic loss problem evaluated at zero on the mushroom and phishing datasets. The cubic regularization parameter is set as $M = 0.01$.

Theorems & Definitions (7)

proof
lemma 1: compliance of adaPG$^{q,\frac{q}{2}}$
proof
theorem 1
proof
lemma 2: convergence of \ref{['alg:safe']}
proof

Safeguarding adaptive methods: global convergence of Barzilai-Borwein and other stepsize choices

TL;DR

Abstract

Safeguarding adaptive methods: global convergence of Barzilai-Borwein and other stepsize choices

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (7)