A Simple Adaptive Proximal Gradient Method for Nonconvex Optimization
Zilong Ye, Shiqian Ma, Junfeng Yang, Danqing Zhou
TL;DR
The paper addresses composite nonconvex optimization of the form $F(x)=f(x)+h(x)$ where $f$ has a Lipschitz continuous gradient and $h$ is convex. It introduces AdaPGNC, a parameter-free, line-search-free, single-loop adaptive proximal gradient method that uses curvature-based step sizes estimated from local information and a summable sequence $\{\rho_k\}$, complemented by a Lyapunov function to guarantee convergence. Theoretical guarantees include an optimal $O(1/\varepsilon^2)$ iteration/gradient evaluation complexity to reach an $\varepsilon$-stationary point in the general nonconvex case, and $O(1/k)$ rates in the convex setting with an ergodic variant and BB-step extension. Empirical results show AdaPGNC competing effectively against state-of-the-art parameter-free methods on both nonconvex and convex problems, highlighting its practicality, simplicity, and scalability.
Abstract
Consider composite nonconvex optimization problems where the objective function consists of a smooth nonconvex term (with Lipschitz-continuous gradient) and a convex (possibly nonsmooth) term. Existing parameter-free methods for such problems often rely on complex multi-loop structures, require line searches, or depend on restrictive assumptions (e.g., bounded iterates). To address these limitations, we introduce a novel adaptive proximal gradient method (referred to as AdaPGNC) that features a simple single-loop structure, eliminates the need for line searches, and only requires the gradient's Lipschitz continuity to ensure convergence. Furthermore, AdaPGNC achieves the theoretically optimal iteration/gradient evaluation complexity of $\mathcal{O}(\varepsilon^{-2})$ for finding an $\varepsilon$-stationary point. Our core innovation lies in designing an adaptive step size strategy that leverages upper and lower curvature estimates. A key technical contribution is the development of a novel Lyapunov function that effectively balances the function value gap and the norm-squared of consecutive iterate differences, serving as a central component in our convergence analysis. Preliminary experimental results indicate that AdaPGNC demonstrates competitive performance on several benchmark nonconvex (and convex) problems against state-of-the-art parameter-free methods.
