Table of Contents
Fetching ...

Enhancing Convergence of Decentralized Gradient Tracking under the KL Property

Xiaokai Chen, Tianyu Cao, Gesualdo Scutari

TL;DR

This paper studies decentralized nonconvex nonsmooth optimization over connected networks by analyzing the KL property of the global objective $u=f+r$, where $f(x)=\frac{1}{m}\sum_i f_i(x)$ and $r$ is convex extended-valued. It shows that the SONATA gradient-tracking method converges to stationary points with rates determined by the KL exponent $\theta\in[0,1)$: linear for $\theta\in(0,1/2]$, sublinear for $\theta\in(1/2,1)$, and finite or linear for $\theta=0$, with iteration complexity that scales as $\tilde{O}\left(\frac{L}{\kappa^{1/\theta}}\frac{1}{1-\rho}\log\left(\frac{1}{\epsilon}\right)\right)$ for small $\epsilon$ in the linear cases. A novel Lyapunov-based analysis explicitly leverages the KL property of $u$ and decouples consensus/tracking errors, enabling rate results that match centralized proximal-gradient behavior up to network-dependent constants. Numerical experiments on decentralized PCA, LASSO, and SCAD-regularized regression corroborate the theory, showing linear convergence in practice and demonstrating the method’s robustness to network connectivity. The work thereby closes a gap between centralized KL-based convergence theory and decentralized gradient-tracking for broad nonconvex problems with structure captured by KL growth.

Abstract

We study decentralized multiagent optimization over networks, modeled as undirected graphs. The optimization problem consists of minimizing a nonconvex smooth function plus a convex extended-value function, which enforces constraints or extra structure on the solution (e.g., sparsity, low-rank). We further assume that the objective function satisfies the Kurdyka-Łojasiewicz (KL) property, with given exponent $θ\in [0,1)$. The KL property is satisfied by several (nonconvex) functions of practical interest, e.g., arising from machine learning applications; in the centralized setting, it permits to achieve strong convergence guarantees. Here we establish convergence of the same type for the notorious decentralized gradient-tracking-based algorithm SONATA. Specifically, $\textbf{(i)}$ when $θ\in (0,1/2]$, the sequence generated by SONATA converges to a stationary solution of the problem at R-linear rate;$ \textbf{(ii)} $when $θ\in (1/2,1)$, sublinear rate is certified; and finally $\textbf{(iii)}$ when $θ=0$, the iterates will either converge in a finite number of steps or converges at R-linear rate. This matches the convergence behavior of centralized proximal-gradient algorithms except when $θ=0$. Numerical results validate our theoretical findings.

Enhancing Convergence of Decentralized Gradient Tracking under the KL Property

TL;DR

This paper studies decentralized nonconvex nonsmooth optimization over connected networks by analyzing the KL property of the global objective , where and is convex extended-valued. It shows that the SONATA gradient-tracking method converges to stationary points with rates determined by the KL exponent : linear for , sublinear for , and finite or linear for , with iteration complexity that scales as for small in the linear cases. A novel Lyapunov-based analysis explicitly leverages the KL property of and decouples consensus/tracking errors, enabling rate results that match centralized proximal-gradient behavior up to network-dependent constants. Numerical experiments on decentralized PCA, LASSO, and SCAD-regularized regression corroborate the theory, showing linear convergence in practice and demonstrating the method’s robustness to network connectivity. The work thereby closes a gap between centralized KL-based convergence theory and decentralized gradient-tracking for broad nonconvex problems with structure captured by KL growth.

Abstract

We study decentralized multiagent optimization over networks, modeled as undirected graphs. The optimization problem consists of minimizing a nonconvex smooth function plus a convex extended-value function, which enforces constraints or extra structure on the solution (e.g., sparsity, low-rank). We further assume that the objective function satisfies the Kurdyka-Łojasiewicz (KL) property, with given exponent . The KL property is satisfied by several (nonconvex) functions of practical interest, e.g., arising from machine learning applications; in the centralized setting, it permits to achieve strong convergence guarantees. Here we establish convergence of the same type for the notorious decentralized gradient-tracking-based algorithm SONATA. Specifically, when , the sequence generated by SONATA converges to a stationary solution of the problem at R-linear rate;when , sublinear rate is certified; and finally when , the iterates will either converge in a finite number of steps or converges at R-linear rate. This matches the convergence behavior of centralized proximal-gradient algorithms except when . Numerical results validate our theoretical findings.

Paper Structure

This paper contains 24 sections, 16 theorems, 103 equations, 4 figures.

Key Result

Lemma 1

For any $L$-smooth function $u:\mathbb{R}^d\rightarrow \mathbb{R}$, set of weights $\{w_i\}_{i=1}^m$, with $w_i\geq 0$ and $\sum_{i=1}^m w_i=1$, and $x_i\in\mathbb{R}^d$, $i=1,\cdots,m$, the following holds

Figures (4)

  • Figure 1: Distance of the iterates ($x_i^{\nu}$, $i\in[m]$ ) from a stationary solution ($x^*$) of the PCA (upper panel) and LASSO (lower panel) problems, defined as $\mathcal{P}^\nu := \sqrt{\sum_{i=1}^m\|x_i^{\nu}-x^*\|^2}$, versus the iterations $\nu$.
  • Figure 2: Decentralized PCA: Distance of the iterates from a stationary solution vs. iterations.
  • Figure 3: LASSO with $\ell_1$ regularization: distance of the iterates from a solution vs. iterations.
  • Figure 4: Sparse linear regression with SCAD regularization: Distance from stationarity vs. iterations.

Theorems & Definitions (32)

  • Definition 1
  • Lemma 1
  • Theorem 1
  • Lemma 2
  • proof
  • Theorem 2
  • Corollary 1
  • Lemma 3
  • proof
  • Lemma 4: Exponent for separable sums of KL functions
  • ...and 22 more