On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms

Puya Latafat; Andreas Themelis; Panagiotis Patrinos

On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms

Puya Latafat, Andreas Themelis, Panagiotis Patrinos

TL;DR

This paper tackles first-order convex optimization with nonsmooth terms by developing a linesearch-free adaptive framework for proximal gradient methods. It introduces adaPG$^{q,r}$, a two-parameter scheme that permits larger stepsizes and tighter lower bounds through backward-looking Lipschitz estimates, along with a general convergence theory for time-varying parameters. It also extends the idea to adaptive alternating minimization (AMA) via a dual formulation, resulting in AdaAMA$^{q,r}$ that relaxes strong convexity to local strong convexity and broadens applicability. The combination of a unified analytical framework and dualized adaptive AMA yields practical, flexible algorithms with validated performance in numerical experiments and promising directions for nonconvex and bilevel extensions.

Abstract

Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes adaPG$^{q,r}$, a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters $q$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations. In an attempt to better understand the underlying theory, its convergence is established in a more general setting that allows for time-varying parameters. Finally, an adaptive alternating minimization algorithm is presented by exploring the dual setting. This algorithm not only incorporates additional adaptivity, but also expands its applicability beyond standard strongly convex settings.

On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms

TL;DR

This paper tackles first-order convex optimization with nonsmooth terms by developing a linesearch-free adaptive framework for proximal gradient methods. It introduces adaPG

, a two-parameter scheme that permits larger stepsizes and tighter lower bounds through backward-looking Lipschitz estimates, along with a general convergence theory for time-varying parameters. It also extends the idea to adaptive alternating minimization (AMA) via a dual formulation, resulting in AdaAMA

that relaxes strong convexity to local strong convexity and broadens applicability. The combination of a unified analytical framework and dualized adaptive AMA yields practical, flexible algorithms with validated performance in numerical experiments and promising directions for nonconvex and bilevel extensions.

Abstract

Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes adaPG

, a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters

and

are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations. In an attempt to better understand the underlying theory, its convergence is established in a more general setting that allows for time-varying parameters. Finally, an adaptive alternating minimization algorithm is presented by exploring the dual setting. This algorithm not only incorporates additional adaptivity, but also expands its applicability beyond standard strongly convex settings.

Paper Structure (6 sections, 10 theorems, 51 equations, 1 figure, 1 algorithm)

This paper contains 6 sections, 10 theorems, 51 equations, 1 figure, 1 algorithm.

Introduction
A general framework for adaptive proximal gradient methods
A class of adaptive alternating minimization algorithms
Numerical simulations
Conclusions
Useful lemmas

Key Result

theorem 1

Under ass:PG:P, for any $q>r\geq\frac{1}{2}$ the sequence $\seq{x^k}$ generated by alg:adaPG converges to some $x^\star\in\argmin\varphi$. If in addition $q \leq \tfrac{1}{2}(3+\sqrt5)$, then where $L_{f,\mathcal{V}}$ is a Lipschitz modulus for $\nabla f$ on a convex and compact set $\mathcal{V}$ that contains $\seq{x^k}$. Moreover, $\min_{k\leq K} (\varphi(x^k) - \min \varphi) \leq \frac{\mat

Figures (1)

Figure 1: First row: regularized least squares, second row: $\ell_1$-regularized logistic regression, third row: cubic regularization with Hessian generated for the logistic loss problem evaluated at zero. For the linesearch method PG-ls$^b$, in each simulation only the best outcome for $b\in\undefined{1,\, 1.1,\, 1.3,\, 1.5,\, 2}$ is reported.

Theorems & Definitions (19)

theorem 1
lemma 1: FNE-like inequality
remark 1
theorem 2: main PG inequality
proof
theorem 3: convergence of PG with nonvanishing stepsizes
proof
lemma 2
proof
definition 1
...and 9 more

On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms

TL;DR

Abstract

On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (19)