Table of Contents
Fetching ...

A Simple Adaptive Proximal Gradient Method for Nonconvex Optimization

Zilong Ye, Shiqian Ma, Junfeng Yang, Danqing Zhou

TL;DR

The paper addresses composite nonconvex optimization of the form $F(x)=f(x)+h(x)$ where $f$ has a Lipschitz continuous gradient and $h$ is convex. It introduces AdaPGNC, a parameter-free, line-search-free, single-loop adaptive proximal gradient method that uses curvature-based step sizes estimated from local information and a summable sequence $\{\rho_k\}$, complemented by a Lyapunov function to guarantee convergence. Theoretical guarantees include an optimal $O(1/\varepsilon^2)$ iteration/gradient evaluation complexity to reach an $\varepsilon$-stationary point in the general nonconvex case, and $O(1/k)$ rates in the convex setting with an ergodic variant and BB-step extension. Empirical results show AdaPGNC competing effectively against state-of-the-art parameter-free methods on both nonconvex and convex problems, highlighting its practicality, simplicity, and scalability.

Abstract

Consider composite nonconvex optimization problems where the objective function consists of a smooth nonconvex term (with Lipschitz-continuous gradient) and a convex (possibly nonsmooth) term. Existing parameter-free methods for such problems often rely on complex multi-loop structures, require line searches, or depend on restrictive assumptions (e.g., bounded iterates). To address these limitations, we introduce a novel adaptive proximal gradient method (referred to as AdaPGNC) that features a simple single-loop structure, eliminates the need for line searches, and only requires the gradient's Lipschitz continuity to ensure convergence. Furthermore, AdaPGNC achieves the theoretically optimal iteration/gradient evaluation complexity of $\mathcal{O}(\varepsilon^{-2})$ for finding an $\varepsilon$-stationary point. Our core innovation lies in designing an adaptive step size strategy that leverages upper and lower curvature estimates. A key technical contribution is the development of a novel Lyapunov function that effectively balances the function value gap and the norm-squared of consecutive iterate differences, serving as a central component in our convergence analysis. Preliminary experimental results indicate that AdaPGNC demonstrates competitive performance on several benchmark nonconvex (and convex) problems against state-of-the-art parameter-free methods.

A Simple Adaptive Proximal Gradient Method for Nonconvex Optimization

TL;DR

The paper addresses composite nonconvex optimization of the form where has a Lipschitz continuous gradient and is convex. It introduces AdaPGNC, a parameter-free, line-search-free, single-loop adaptive proximal gradient method that uses curvature-based step sizes estimated from local information and a summable sequence , complemented by a Lyapunov function to guarantee convergence. Theoretical guarantees include an optimal iteration/gradient evaluation complexity to reach an -stationary point in the general nonconvex case, and rates in the convex setting with an ergodic variant and BB-step extension. Empirical results show AdaPGNC competing effectively against state-of-the-art parameter-free methods on both nonconvex and convex problems, highlighting its practicality, simplicity, and scalability.

Abstract

Consider composite nonconvex optimization problems where the objective function consists of a smooth nonconvex term (with Lipschitz-continuous gradient) and a convex (possibly nonsmooth) term. Existing parameter-free methods for such problems often rely on complex multi-loop structures, require line searches, or depend on restrictive assumptions (e.g., bounded iterates). To address these limitations, we introduce a novel adaptive proximal gradient method (referred to as AdaPGNC) that features a simple single-loop structure, eliminates the need for line searches, and only requires the gradient's Lipschitz continuity to ensure convergence. Furthermore, AdaPGNC achieves the theoretically optimal iteration/gradient evaluation complexity of for finding an -stationary point. Our core innovation lies in designing an adaptive step size strategy that leverages upper and lower curvature estimates. A key technical contribution is the development of a novel Lyapunov function that effectively balances the function value gap and the norm-squared of consecutive iterate differences, serving as a central component in our convergence analysis. Preliminary experimental results indicate that AdaPGNC demonstrates competitive performance on several benchmark nonconvex (and convex) problems against state-of-the-art parameter-free methods.

Paper Structure

This paper contains 13 sections, 8 theorems, 54 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Let $\{\lambda_k\}$ be the sequence of step sizes generated by Algorithm adagdnc, and define $P:=\sum_{k=0}^{\infty} \rho_k$. Then, $\{\lambda_k\}$ is bounded below and above by positive constants $\lambda>0$ and $\Lambda>0$, respectively. Specifically, we have

Figures (6)

  • Figure 1: Comparison results for the classification problem.
  • Figure 2: Comparison results for the autoencoder training problem.
  • Figure 3: Comparison results for the matrix completion problem on MovieLens-100K.
  • Figure 4: Comparison results for the matrix completion problem on MovieLens-1M.
  • Figure 5: Comparison results for the logistic regression problem across the three datasets.
  • ...and 1 more figures

Theorems & Definitions (18)

  • Proposition 1: boundedness of $\{\lambda_k\}$
  • proof
  • Lemma 1: see malitsky2024adaptive
  • Lemma 2: see beck2017first
  • Lemma 3: properties of $\{\omega_k\}$
  • proof
  • Lemma 4
  • proof
  • Theorem 1
  • proof
  • ...and 8 more