Enhancing Convergence of Decentralized Gradient Tracking under the KL Property
Xiaokai Chen, Tianyu Cao, Gesualdo Scutari
TL;DR
This paper studies decentralized nonconvex nonsmooth optimization over connected networks by analyzing the KL property of the global objective $u=f+r$, where $f(x)=\frac{1}{m}\sum_i f_i(x)$ and $r$ is convex extended-valued. It shows that the SONATA gradient-tracking method converges to stationary points with rates determined by the KL exponent $\theta\in[0,1)$: linear for $\theta\in(0,1/2]$, sublinear for $\theta\in(1/2,1)$, and finite or linear for $\theta=0$, with iteration complexity that scales as $\tilde{O}\left(\frac{L}{\kappa^{1/\theta}}\frac{1}{1-\rho}\log\left(\frac{1}{\epsilon}\right)\right)$ for small $\epsilon$ in the linear cases. A novel Lyapunov-based analysis explicitly leverages the KL property of $u$ and decouples consensus/tracking errors, enabling rate results that match centralized proximal-gradient behavior up to network-dependent constants. Numerical experiments on decentralized PCA, LASSO, and SCAD-regularized regression corroborate the theory, showing linear convergence in practice and demonstrating the method’s robustness to network connectivity. The work thereby closes a gap between centralized KL-based convergence theory and decentralized gradient-tracking for broad nonconvex problems with structure captured by KL growth.
Abstract
We study decentralized multiagent optimization over networks, modeled as undirected graphs. The optimization problem consists of minimizing a nonconvex smooth function plus a convex extended-value function, which enforces constraints or extra structure on the solution (e.g., sparsity, low-rank). We further assume that the objective function satisfies the Kurdyka-Łojasiewicz (KL) property, with given exponent $θ\in [0,1)$. The KL property is satisfied by several (nonconvex) functions of practical interest, e.g., arising from machine learning applications; in the centralized setting, it permits to achieve strong convergence guarantees. Here we establish convergence of the same type for the notorious decentralized gradient-tracking-based algorithm SONATA. Specifically, $\textbf{(i)}$ when $θ\in (0,1/2]$, the sequence generated by SONATA converges to a stationary solution of the problem at R-linear rate;$ \textbf{(ii)} $when $θ\in (1/2,1)$, sublinear rate is certified; and finally $\textbf{(iii)}$ when $θ=0$, the iterates will either converge in a finite number of steps or converges at R-linear rate. This matches the convergence behavior of centralized proximal-gradient algorithms except when $θ=0$. Numerical results validate our theoretical findings.
