Convergence analysis of nonmonotone proximal gradient methods under local Lipschitz continuity and Kurdyka--Łojasiewicz property
Xiaoxi Jia, Kai Wang
TL;DR
The paper studies convergence of nonmonotone proximal gradient methods for nonconvex composite problems under local Lipschitz continuity of the gradient and the Kurdyka--Łojasiewicz property. It analyzes two line-search variants—average and max—using a two-set index partition to recover descent-like behavior without requiring bounded iterates, and proves global convergence of the entire sequence to M-stationary points along with KL-based convergence rates that depend on the exponent $\theta$ of the desingularization function $\chi$. The authors further show that convergence outcomes are largely independent of the specific partitioning strategy, and extend the max-line-search analysis to obtain analogous convergence and rate results without auxiliary sequences. The results broaden proximal-gradient convergence theory for nonconvex problems by relaxing global Lipschitz assumptions and providing unified KL-driven rates, with practical implications for optimization in machine learning, imaging, and data science.
Abstract
The proximal gradient method is a standard approach for solving composite minimization problems in which the objective function is the sum of a continuously differentiable function and a lower semicontinuous, extended-valued function. The traditional convergence theory for both monotone and nonmonotone variants replies heavily on the assumption of global Lipschitz continuity of the gradient of the smooth part of the objective function. Recent work has shown that monotone proximal gradient methods converge globally only when the local (rather than global) Lipschitz continuity is assumed, provided that the Kurdyka--Łojasiewicz (KL) property holds. However, these results have not been extended to nonmonotone proximal gradient (NPG) methods. In this manuscript, we consider two types of NPG methods: those combined with the average line search and the max line search, respectively. By partitioning indices into two subsets, one of which aims to achieve a sufficient decrease in the functional sequence, we establish global convergence and rate-of-convergence results using the local Lipschitz continuity and the KL property, without requiring boundedness of the iterates. While finalizing this work, we noticed that [18] presented analogous results for the NPG method with average line search, but with a different partitioning strategy. Together, we confidently conclude that the convergence theory of the NPG method is independent on index partitioning choices.
