A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods

Junwen Qiu; Bohao Ma; Xiao Li; Andre Milzarek

A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods

Junwen Qiu, Bohao Ma, Xiao Li, Andre Milzarek

TL;DR

This work introduces a KL-based analysis framework for non-descent optimization in nonconvex settings, enabling iterate convergence results for stochastic and distributed methods that lack a strict descent property. By enforcing approximate descent and gradient-bounded updates with diminishing step sizes under the KL property, the framework proves convergence of iterates to stationary points (or divergence to infinity) and provides local convergence rates tied to the KL exponent ${\theta}$ and step-size parameter ${\gamma}$. A specialized polynomial-step-size case yields explicit rates for function values, gradients, and iterates, with guidance on choosing ${\gamma}$ to maximize speed. The framework is applied to SGD, RR, DGD, and FedAvg, delivering new convergence guarantees for the nonconvex DGD and FedAvg settings under shuffling, and recovering known results for RR and SGD without requiring a priori bounded iterates. Overall, the results offer a unified, scalable tool for analyzing non-descent methods in large-scale stochastic and distributed optimization tasks with broad practical impact.

Abstract

We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not require access to full (deterministic) gradient information. We leverage this framework to establish, for the first time, iterate convergence and the corresponding rates for the decentralized gradient method and federated averaging under mild assumptions. Furthermore, based on the new analysis techniques, we show the convergence of the random reshuffling and stochastic gradient descent method without necessitating typical a priori bounded iterates assumptions.

A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods

TL;DR

and step-size parameter

. A specialized polynomial-step-size case yields explicit rates for function values, gradients, and iterates, with guidance on choosing

to maximize speed. The framework is applied to SGD, RR, DGD, and FedAvg, delivering new convergence guarantees for the nonconvex DGD and FedAvg settings under shuffling, and recovering known results for RR and SGD without requiring a priori bounded iterates. Overall, the results offer a unified, scalable tool for analyzing non-descent methods in large-scale stochastic and distributed optimization tasks with broad practical impact.

Abstract

Paper Structure (27 sections, 22 theorems, 157 equations)

This paper contains 27 sections, 22 theorems, 157 equations.

Introduction
Related Work
Contribution and Organization
Framework and Convergence Results
Basic Assumptions
The Proposed Framework and Convergence Results
A Special Case and Convergence Rates
Application Area I : Stochastic Approximation Methods
Application Area II : Finite-sum Optimization
Decentralized Gradient Descent
Random Reshuffling
Federated Averaging
Conclusion
Proof of Main Convergence Results
Proof of \ref{['thm:main']} (a)
...and 12 more sections

Key Result

Theorem 1

Let as:func-1 hold and suppose that the iterates $\{x^k\}_k$ satisfy C1--C3. Then, the following statements hold:

Theorems & Definitions (43)

Theorem 1
Remark 2
Lemma 3: Uniformized KL property
proof
Theorem 4: Convergence rates
Remark 5: Optimal choice of $\gamma$
Proposition 6
proof
Remark 7
Theorem 8
...and 33 more

A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods

TL;DR

Abstract

A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (43)