On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition

Yunyan Bai; Yuxing Liu; Luo Luo

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition

Yunyan Bai, Yuxing Liu, Luo Luo

TL;DR

The paper studies finite-sum optimization under the Polyak--Łojasiewicz condition with mean-squared smoothness, establishing a near-tight information-theoretic lower bound of $\Omega\left(n+\kappa\sqrt{n}\log\left(1/\varepsilon\right)\right)$ IFO calls. Extending to decentralized networks, it derives lower bounds for communication rounds, time, and LFO calls that scale with the network’s spectral gap $\gamma$ and per-round cost $\tau$, namely $\Omega\left(\kappa/\sqrt{\gamma}\log(1/\varepsilon)\right)$, $\Omega\left(\kappa\left(1+\tau/\sqrt{\gamma}\right)\log(1/\varepsilon)\right)$, and $\Omega\left(n+\kappa\sqrt{n}\log(1/\varepsilon)\right)$. To close the gap, the paper introduces DRONE, a decentralized first-order method that nearly matches these lower bounds in expectation, supported by a Lyapunov-based analysis and carefully chosen communication/computation parameters. Empirical results on hard instances, linear, and logistic regression corroborate the theoretical findings, illustrating the trade-offs between LFO, communication, and runtime in decentralized optimization. Overall, the work clarifies fundamental limits of PL-based finite-sum optimization and delivers a practically near-optimal decentralized algorithm with provable guarantees.

Abstract

This paper considers the optimization problem of the form $\min_{{\bf x}\in{\mathbb R}^d} f({\bf x})\triangleq \frac{1}{n}\sum_{i=1}^n f_i({\bf x})$, where $f(\cdot)$ satisfies the Polyak--Łojasiewicz (PL) condition with parameter $μ$ and $\{f_i(\cdot)\}_{i=1}^n$ is $L$-mean-squared smooth. We show that any gradient method requires at least $Ω(n+κ\sqrt{n}\log(1/ε))$ incremental first-order oracle (IFO) calls to find an $ε$-suboptimal solution, where $κ\triangleq L/μ$ is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals $f_1(\cdot),\dots,f_n(\cdot)$ are located on a connected network of $n$ agents. We provide lower bounds of $Ω(κ/\sqrtγ\,\log(1/ε))$, $Ω((κ+τκ/\sqrtγ\,)\log(1/ε))$ and $Ω\big(n+κ\sqrt{n}\log(1/ε)\big)$ for communication rounds, time cost and local first-order oracle calls respectively, where $γ\in(0,1]$ is the spectral gap of the mixing matrix associated with the network and~$τ>0$ is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition

TL;DR

The paper studies finite-sum optimization under the Polyak--Łojasiewicz condition with mean-squared smoothness, establishing a near-tight information-theoretic lower bound of

IFO calls. Extending to decentralized networks, it derives lower bounds for communication rounds, time, and LFO calls that scale with the network’s spectral gap

and per-round cost

, namely

, and

. To close the gap, the paper introduces DRONE, a decentralized first-order method that nearly matches these lower bounds in expectation, supported by a Lyapunov-based analysis and carefully chosen communication/computation parameters. Empirical results on hard instances, linear, and logistic regression corroborate the theoretical findings, illustrating the trade-offs between LFO, communication, and runtime in decentralized optimization. Overall, the work clarifies fundamental limits of PL-based finite-sum optimization and delivers a practically near-optimal decentralized algorithm with provable guarantees.

Abstract

This paper considers the optimization problem of the form

, where

satisfies the Polyak--Łojasiewicz (PL) condition with parameter

and

-mean-squared smooth. We show that any gradient method requires at least

incremental first-order oracle (IFO) calls to find an

-suboptimal solution, where

is the condition number of the problem. This result nearly matches upper bounds of IFO complexity for best-known first-order methods. We also study the problem of minimizing the PL function in the distributed setting such that the individuals

are located on a connected network of

agents. We provide lower bounds of

and

for communication rounds, time cost and local first-order oracle calls respectively, where

is the spectral gap of the mixing matrix associated with the network and~

is the time cost of per communication round. Furthermore, we propose a decentralized first-order method that nearly matches above lower bounds in expectation.

Paper Structure (24 sections, 20 theorems, 110 equations, 3 figures, 2 tables, 5 algorithms)

This paper contains 24 sections, 20 theorems, 110 equations, 3 figures, 2 tables, 5 algorithms.

Introduction
Preliminaries
Notation and Assumptions
The Finite-Sum Optimization
The Lower Bound on IFO Complexity
Lower Bounds in Decentralized Setting
Decentralized First-Order Algorithms
Numerical Experiments
Conclusion
The Proofs for Section \ref{['sec:IFO']}
The Proof of Lemma \ref{['lem:finite-sum']}
The Proof of Lemma \ref{['lem:scale']}
The Proof of Theorem \ref{['thm:kn']}
The Proof of Theorem \ref{['thm:n']}
The Proof of Corollary \ref{['cor:ifo']}
...and 9 more sections

Key Result

Lemma 3.1

The function $g_{T,t}:{\mathbb{R}}^{Tt}\to{\mathbb{R}}$ holds that:

Figures (3)

Figure 1: The results for the hard instance in the proof of Theorem \ref{['thm:decentralized-lower']}.
Figure 2: The results for linear regression on dataset "DrivFace".
Figure 3: The results for logistic regression on dataset "RCV1".

Theorems & Definitions (41)

Definition 2.4
Definition 2.5
Definition 2.7
Definition 2.8
Definition 2.9
Definition 2.10
Lemma 3.1
Lemma 3.2
Lemma 3.3
Theorem 3.4
...and 31 more

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition

TL;DR

Abstract

On the Complexity of Finite-Sum Smooth Optimization under the Polyak-Łojasiewicz Condition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (41)