Table of Contents
Fetching ...

Adaptive Proximal Gradient Method for Convex Optimization

Yura Malitsky, Konstantin Mishchenko

TL;DR

This work develops fully adaptive first-order methods for convex optimization by exploiting local curvature information without extra computational cost. It introduces Adaptive Gradient Descent (AdGD) and Adaptive Proximal Gradient (AdProxGD), proving convergence under locally Lipschitz gradients and enabling larger steps than traditional fixed-step schemes. The analysis sharpens step-size bounds, proposes an improved adaptive update, and extends to the proximal/composite setting with analogous convergence guarantees. Empirical results on problems such as maximum-likelihood estimation of covariance, low-rank matrix completion, and entropy maximization demonstrate practical speedups over Armijo-type linesearch baselines, validating the approach's efficiency and robustness.

Abstract

In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions. We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs. Moreover, we prove convergence of our methods assuming only local Lipschitzness of the gradient. In addition, the proposed versions allow for even larger stepsizes than those initially suggested in [MM20].

Adaptive Proximal Gradient Method for Convex Optimization

TL;DR

This work develops fully adaptive first-order methods for convex optimization by exploiting local curvature information without extra computational cost. It introduces Adaptive Gradient Descent (AdGD) and Adaptive Proximal Gradient (AdProxGD), proving convergence under locally Lipschitz gradients and enabling larger steps than traditional fixed-step schemes. The analysis sharpens step-size bounds, proposes an improved adaptive update, and extends to the proximal/composite setting with analogous convergence guarantees. Empirical results on problems such as maximum-likelihood estimation of covariance, low-rank matrix completion, and entropy maximization demonstrate practical speedups over Armijo-type linesearch baselines, validating the approach's efficiency and robustness.

Abstract

In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions. We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs. Moreover, we prove convergence of our methods assuming only local Lipschitzness of the gradient. In addition, the proposed versions allow for even larger stepsizes than those initially suggested in [MM20].
Paper Structure (33 sections, 19 theorems, 88 equations, 5 figures, 3 algorithms)

This paper contains 33 sections, 19 theorems, 88 equations, 5 figures, 3 algorithms.

Key Result

Theorem 1

For any $c\geqslant 1$ there exists $x^0$ such that the method bad_gd applied to $f$ defined in eq:counter diverges.

Figures (5)

  • Figure 1: Maximum likelihood estimate, problem \ref{['eq:mle']}
  • Figure 2: Low-rank matrix completion, problem \ref{['eq:lrmc']}
  • Figure 3: Minimal length piecewise-linear curve, problem \ref{['eq:mlc']}
  • Figure 4: Nonnegative matrix factorization, problem \ref{['eq:nmf']}
  • Figure 5: Dual of the entropy maximization, problem \ref{['eq:dual_maxent']}

Theorems & Definitions (44)

  • Theorem 1
  • Lemma 1
  • proof
  • Remark 1
  • Remark 2
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • ...and 34 more