On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization

Luo Luo; Yunyan Bai; Lesi Chen; Yuxing Liu; Haishan Ye

On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization

Luo Luo, Yunyan Bai, Lesi Chen, Yuxing Liu, Haishan Ye

TL;DR

This work tackles decentralized smooth nonconvex finite-sum optimization by formulating a global objective $f(x)$ as the mean of local finite-sum components and introducing DEAREST$^+$, a stochastic variance-reduced, gradient-tracking algorithm with multi-consensus. The authors derive sharp complexity bounds that depend on the global smoothness $L$, the mean-squared smoothness $\bar{L}$, and the network spectral gap $\gamma$, showing near-optimal communication, computation, and LIFO complexities. They further extend the approach to the Polyak-Łojasiewicz setting, obtaining matching complexity guarantees and establishing lower bounds for both general nonconvex and PL cases. Numerical experiments on regression problems and PL-laden scenarios demonstrate that DEAREST$^+$ consistently outperforms strong decentralized baselines across the key metrics of communication, LIFO, and computation costs, highlighting its practical impact for distributed learning and optimization on networks.

Abstract

We study the decentralized optimization problem $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{m}\sum_{i=1}^m f_i({\bf x})$, where the local function on the $i$-th agent has the form of $f_i({\bf x})\triangleq \frac{1}{n}\sum_{j=1}^n f_{i,j}({\bf x})$ and every individual $f_{i,j}$ is smooth but possibly nonconvex. We propose a stochastic algorithm called DEcentralized probAbilistic Recursive gradiEnt deScenT (DEAREST) method, which achieves an $ε$-stationary point at each agent with the communication rounds of $\tilde{\mathcal O}(Lε^{-2}/\sqrtγ\,)$, the computation rounds of $\tilde{\mathcal O}(n+(L+\min\{nL, \sqrt{n/m}\bar L\})ε^{-2})$, and the local incremental first-oracle calls of ${\mathcal O}(mn + {\min\{mnL, \sqrt{mn}\bar L\}}{ε^{-2}})$, where $L$ is the smoothness parameter of the objective function, $\bar L$ is the mean-squared smoothness parameter of all individual functions, and $γ$ is the spectral gap of the mixing matrix associated with the network. We then establish the lower bounds to show that the proposed method is near-optimal. Notice that the smoothness parameters $L$ and $\bar L$ used in our algorithm design and analysis are global, leading to sharper complexity bounds than existing results that depend on the local smoothness. We further extend DEAREST to solve the decentralized finite-sum optimization problem under the Polyak-Łojasiewicz condition, also achieving the near-optimal complexity bounds.

On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization

TL;DR

This work tackles decentralized smooth nonconvex finite-sum optimization by formulating a global objective

as the mean of local finite-sum components and introducing DEAREST

, a stochastic variance-reduced, gradient-tracking algorithm with multi-consensus. The authors derive sharp complexity bounds that depend on the global smoothness

, the mean-squared smoothness

, and the network spectral gap

, showing near-optimal communication, computation, and LIFO complexities. They further extend the approach to the Polyak-Łojasiewicz setting, obtaining matching complexity guarantees and establishing lower bounds for both general nonconvex and PL cases. Numerical experiments on regression problems and PL-laden scenarios demonstrate that DEAREST

consistently outperforms strong decentralized baselines across the key metrics of communication, LIFO, and computation costs, highlighting its practical impact for distributed learning and optimization on networks.

Abstract

We study the decentralized optimization problem

, where the local function on the

-th agent has the form of

and every individual

is smooth but possibly nonconvex. We propose a stochastic algorithm called DEcentralized probAbilistic Recursive gradiEnt deScenT (DEAREST) method, which achieves an

-stationary point at each agent with the communication rounds of

, the computation rounds of

, and the local incremental first-oracle calls of

, where

is the smoothness parameter of the objective function,

is the mean-squared smoothness parameter of all individual functions, and

is the spectral gap of the mixing matrix associated with the network. We then establish the lower bounds to show that the proposed method is near-optimal. Notice that the smoothness parameters

and

used in our algorithm design and analysis are global, leading to sharper complexity bounds than existing results that depend on the local smoothness. We further extend DEAREST to solve the decentralized finite-sum optimization problem under the Polyak-Łojasiewicz condition, also achieving the near-optimal complexity bounds.

Paper Structure (27 sections, 34 theorems, 271 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 34 theorems, 271 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
Preliminaries
Notations
Problem Settings
The Algorithm and Main Results
The Complexity Analysis for DEAREST$^+$
Some Basic Lemmas
The Recursion for the Lyapunov Function
The Proofs of Theorem \ref{['thm:main']}
The Proofs of Corollary \ref{['cor:complexity-general']}
The Upper Bound on Communication Rounds
The Upper Bound on LIFO Calls
The Upper Bound on Computation Rounds
The Lower Bounds for General Nonconvex Case
The Lower Bound on Communication Rounds
...and 12 more sections

Key Result

Proposition 1

The smoothness conditions in Assumptions asm:smooth-global and asm:smooth-mean-squared have the following relationships:

Figures (6)

Figure 1: Results on the hard instance (\ref{['eq:ifo-instance-1']}) in the lower bound for the general nonconvex case.
Figure 2: Results on the regularized linear regression (the general nonconvex case).
Figure 3: Experiment results on the regularized logistic regression (the general nonconvex case).
Figure 4: Results on the hard instance (\ref{['eq:comm-alpha-beta-pl']}) in the lower bound for the PL case.
Figure 5: Results on the linear regression (the PL case).
...and 1 more figures

Theorems & Definitions (71)

Proposition 1
Remark 1
Proposition 2
Remark 2
Remark 3
Definition 1: hendrikx2021optimalliu2024decentralized
Theorem 1
Corollary 1
Remark 4
Remark 5
...and 61 more

On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization

TL;DR

Abstract

On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (71)