Table of Contents
Fetching ...

A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods

Junwen Qiu, Bohao Ma, Xiao Li, Andre Milzarek

TL;DR

This work introduces a KL-based analysis framework for non-descent optimization in nonconvex settings, enabling iterate convergence results for stochastic and distributed methods that lack a strict descent property. By enforcing approximate descent and gradient-bounded updates with diminishing step sizes under the KL property, the framework proves convergence of iterates to stationary points (or divergence to infinity) and provides local convergence rates tied to the KL exponent ${\theta}$ and step-size parameter ${\gamma}$. A specialized polynomial-step-size case yields explicit rates for function values, gradients, and iterates, with guidance on choosing ${\gamma}$ to maximize speed. The framework is applied to SGD, RR, DGD, and FedAvg, delivering new convergence guarantees for the nonconvex DGD and FedAvg settings under shuffling, and recovering known results for RR and SGD without requiring a priori bounded iterates. Overall, the results offer a unified, scalable tool for analyzing non-descent methods in large-scale stochastic and distributed optimization tasks with broad practical impact.

Abstract

We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not require access to full (deterministic) gradient information. We leverage this framework to establish, for the first time, iterate convergence and the corresponding rates for the decentralized gradient method and federated averaging under mild assumptions. Furthermore, based on the new analysis techniques, we show the convergence of the random reshuffling and stochastic gradient descent method without necessitating typical a priori bounded iterates assumptions.

A KL-based Analysis Framework with Applications to Non-Descent Optimization Methods

TL;DR

This work introduces a KL-based analysis framework for non-descent optimization in nonconvex settings, enabling iterate convergence results for stochastic and distributed methods that lack a strict descent property. By enforcing approximate descent and gradient-bounded updates with diminishing step sizes under the KL property, the framework proves convergence of iterates to stationary points (or divergence to infinity) and provides local convergence rates tied to the KL exponent and step-size parameter . A specialized polynomial-step-size case yields explicit rates for function values, gradients, and iterates, with guidance on choosing to maximize speed. The framework is applied to SGD, RR, DGD, and FedAvg, delivering new convergence guarantees for the nonconvex DGD and FedAvg settings under shuffling, and recovering known results for RR and SGD without requiring a priori bounded iterates. Overall, the results offer a unified, scalable tool for analyzing non-descent methods in large-scale stochastic and distributed optimization tasks with broad practical impact.

Abstract

We propose a novel analysis framework for non-descent-type optimization methodologies in nonconvex scenarios based on the Kurdyka-Lojasiewicz property. Our framework allows covering a broad class of algorithms, including those commonly employed in stochastic and distributed optimization. Specifically, it enables the analysis of first-order methods that lack a sufficient descent property and do not require access to full (deterministic) gradient information. We leverage this framework to establish, for the first time, iterate convergence and the corresponding rates for the decentralized gradient method and federated averaging under mild assumptions. Furthermore, based on the new analysis techniques, we show the convergence of the random reshuffling and stochastic gradient descent method without necessitating typical a priori bounded iterates assumptions.
Paper Structure (27 sections, 22 theorems, 157 equations)

This paper contains 27 sections, 22 theorems, 157 equations.

Key Result

Theorem 1

Let as:func-1 hold and suppose that the iterates $\{x^k\}_k$ satisfy C1--C3. Then, the following statements hold:

Theorems & Definitions (43)

  • Theorem 1
  • Remark 2
  • Lemma 3: Uniformized KL property
  • proof
  • Theorem 4: Convergence rates
  • Remark 5: Optimal choice of $\gamma$
  • Proposition 6
  • proof
  • Remark 7
  • Theorem 8
  • ...and 33 more