A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods
Junwen Qiu, Bohao Ma, Xiao Li, Andre Milzarek
TL;DR
This work introduces a KL-based analysis framework for non-descent optimization in nonconvex settings, enabling iterate convergence results for stochastic and distributed methods that lack a strict descent property. By enforcing approximate descent and gradient-bounded updates with diminishing step sizes under the KL property, the framework proves convergence of iterates to stationary points (or divergence to infinity) and provides local convergence rates tied to the KL exponent ${\theta}$ and step-size parameter ${\gamma}$. A specialized polynomial-step-size case yields explicit rates for function values, gradients, and iterates, with guidance on choosing ${\gamma}$ to maximize speed. The framework is applied to SGD, RR, DGD, and FedAvg, delivering new convergence guarantees for the nonconvex DGD and FedAvg settings under shuffling, and recovering known results for RR and SGD without requiring a priori bounded iterates. Overall, the results offer a unified, scalable tool for analyzing non-descent methods in large-scale stochastic and distributed optimization tasks with broad practical impact.
Abstract
We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not require access to full (deterministic) gradient information. We leverage this framework to establish, for the first time, iterate convergence and the corresponding rates for the decentralized gradient method and federated averaging under mild assumptions. Furthermore, based on the new analysis techniques, we show the convergence of the random reshuffling and stochastic gradient descent method without necessitating typical a priori bounded iterates assumptions.
