Table of Contents
Fetching ...

On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms

Puya Latafat, Andreas Themelis, Panagiotis Patrinos

TL;DR

This paper tackles first-order convex optimization with nonsmooth terms by developing a linesearch-free adaptive framework for proximal gradient methods. It introduces adaPG$^{q,r}$, a two-parameter scheme that permits larger stepsizes and tighter lower bounds through backward-looking Lipschitz estimates, along with a general convergence theory for time-varying parameters. It also extends the idea to adaptive alternating minimization (AMA) via a dual formulation, resulting in AdaAMA$^{q,r}$ that relaxes strong convexity to local strong convexity and broadens applicability. The combination of a unified analytical framework and dualized adaptive AMA yields practical, flexible algorithms with validated performance in numerical experiments and promising directions for nonconvex and bilevel extensions.

Abstract

Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes adaPG$^{q,r}$, a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters $q$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations. In an attempt to better understand the underlying theory, its convergence is established in a more general setting that allows for time-varying parameters. Finally, an adaptive alternating minimization algorithm is presented by exploring the dual setting. This algorithm not only incorporates additional adaptivity, but also expands its applicability beyond standard strongly convex settings.

On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms

TL;DR

This paper tackles first-order convex optimization with nonsmooth terms by developing a linesearch-free adaptive framework for proximal gradient methods. It introduces adaPG, a two-parameter scheme that permits larger stepsizes and tighter lower bounds through backward-looking Lipschitz estimates, along with a general convergence theory for time-varying parameters. It also extends the idea to adaptive alternating minimization (AMA) via a dual formulation, resulting in AdaAMA that relaxes strong convexity to local strong convexity and broadens applicability. The combination of a unified analytical framework and dualized adaptive AMA yields practical, flexible algorithms with validated performance in numerical experiments and promising directions for nonconvex and bilevel extensions.

Abstract

Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes adaPG, a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters and are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations. In an attempt to better understand the underlying theory, its convergence is established in a more general setting that allows for time-varying parameters. Finally, an adaptive alternating minimization algorithm is presented by exploring the dual setting. This algorithm not only incorporates additional adaptivity, but also expands its applicability beyond standard strongly convex settings.
Paper Structure (6 sections, 10 theorems, 51 equations, 1 figure, 1 algorithm)

This paper contains 6 sections, 10 theorems, 51 equations, 1 figure, 1 algorithm.

Key Result

theorem 1

Under ass:PG:P, for any $q>r\geq\frac{1}{2}$ the sequence $\seq{x^k}$ generated by alg:adaPG converges to some $x^\star\in\argmin\varphi$. If in addition $q \leq \tfrac{1}{2}(3+\sqrt5)$, then where $L_{f,\mathcal{V}}$ is a Lipschitz modulus for $\nabla f$ on a convex and compact set $\mathcal{V}$ that contains $\seq{x^k}$. Moreover, $\min_{k\leq K} (\varphi(x^k) - \min \varphi) \leq \frac{\mat

Figures (1)

  • Figure 1: First row: regularized least squares, second row: $\ell_1$-regularized logistic regression, third row: cubic regularization with Hessian generated for the logistic loss problem evaluated at zero. For the linesearch method PG-ls$^b$, in each simulation only the best outcome for $b\in\undefined{1,\, 1.1,\, 1.3,\, 1.5,\, 2}$ is reported.

Theorems & Definitions (19)

  • theorem 1
  • lemma 1: FNE-like inequality
  • remark 1
  • theorem 2: main PG inequality
  • proof
  • theorem 3: convergence of PG with nonvanishing stepsizes
  • proof
  • lemma 2
  • proof
  • definition 1
  • ...and 9 more