Table of Contents
Fetching ...

Safeguarding adaptive methods: global convergence of Barzilai-Borwein and other stepsize choices

Hongjia Ou, Andreas Themelis

TL;DR

This paper provides a linesearch-free proximal gradient framework for globalizing the convergence of popular stepsize choices such as Barzilai-Borwein and one-dimensional Anderson acceleration and refines existing results upon which it builds.

Abstract

Leveraging on recent advancements on adaptive methods for convex minimization problems, this paper provides a linesearch-free proximal gradient framework for globalizing the convergence of popular stepsize choices such as Barzilai-Borwein and one-dimensional Anderson acceleration. This framework can cope with problems in which the gradient of the differentiable function is merely locally Hölder continuous. Our analysis not only encompasses but also refines existing results upon which it builds. The theory is corroborated by numerical evidence that showcases the synergetic interplay between fast stepsize selections and adaptive methods.

Safeguarding adaptive methods: global convergence of Barzilai-Borwein and other stepsize choices

TL;DR

This paper provides a linesearch-free proximal gradient framework for globalizing the convergence of popular stepsize choices such as Barzilai-Borwein and one-dimensional Anderson acceleration and refines existing results upon which it builds.

Abstract

Leveraging on recent advancements on adaptive methods for convex minimization problems, this paper provides a linesearch-free proximal gradient framework for globalizing the convergence of popular stepsize choices such as Barzilai-Borwein and one-dimensional Anderson acceleration. This framework can cope with problems in which the gradient of the differentiable function is merely locally Hölder continuous. Our analysis not only encompasses but also refines existing results upon which it builds. The theory is corroborated by numerical evidence that showcases the synergetic interplay between fast stepsize selections and adaptive methods.
Paper Structure (16 sections, 3 theorems, 37 equations, 3 figures, 1 algorithm)

This paper contains 16 sections, 3 theorems, 37 equations, 3 figures, 1 algorithm.

Key Result

lemma 1

Let ass:basic hold with $\nu\in[0,1]$. Then, $\seq{x^k}$ and $\seq{\gamma_k}$ generated by adaPG$^{q,\frac{q}{2}}$ (eq:PG and eq:adaPG) satisfy prop:adaPG1prop:adaPG2prop:lammin with where $L_{{\Omega},\nu}$ is a $\nu$-Hölder modulus for $\nabla f$ on a compact and convex set $\Omega$ that contains all the iterates $x^k$.

Figures (3)

  • Figure 1: Random lasso problem with $\ell_1$-regularization parameter $\lambda=0.1$. $n_\star$ represents the number of nonzero elements in the solution.
  • Figure 2: Regularized logistic regression ($m$ and $n$ are the number of samples and features). The $\ell_1$-regularization parameter $\lambda$ is set as in zhou2024adabb.
  • Figure 3: Cubic regularization problem with Hessian and gradient generated from the logistic loss problem evaluated at zero on the mushroom and phishing datasets. The cubic regularization parameter is set as $M = 0.01$.

Theorems & Definitions (7)

  • proof
  • lemma 1: compliance of adaPG$^{q,\frac{q}{2}}$
  • proof
  • theorem 1
  • proof
  • lemma 2: convergence of \ref{['alg:safe']}
  • proof