Convergence result for the gradient-push algorithm and its application to boost up the Push-DIging algorithm

Hyogi Choi; Woocheol Choi; Gwangil Kim

Convergence result for the gradient-push algorithm and its application to boost up the Push-DIging algorithm

Hyogi Choi, Woocheol Choi, Gwangil Kim

TL;DR

The paper addresses distributed optimization over directed graphs using the gradient-push algorithm with a constant stepsize. It introduces a contraction-based analysis via the operator $T_{\alpha}$ and proves that for $\alpha\in(0,\alpha_0]$ the iterates converge linearly to a fixed point $w^{\alpha}$, achieving an $O(\alpha)$-neighborhood of the global minimizer $x_*$. Two function classes are handled: (i) each $f_i$ is $\mu_i$-strongly convex and $L_i$-smooth, and (ii) each $f_i$ is convex quadratic with an $L_i$-smooth aggregate, both yielding an $O(\alpha)$-accurate limit with $\alpha_0$ independent of $L$. A hybrid scheme combining gradient-push with Push-DIGing is proposed to accelerate convergence, and numerical experiments demonstrate substantial performance gains. Overall, the work provides sharp, scalable convergence guarantees for constant-step-size distributed optimization on directed graphs and offers a practical pathway to faster consensus-based learning in networked systems.

Abstract

The gradient-push algorithm is a fundamental algorithm for the distributed optimization problem \begin{equation} \min_{x \in \mathbb{R}^d} f(x) = \sum_{j=1}^n f_j (x), \end{equation} where each local cost $f_j$ is only known to agent $a_i$ for $1 \leq i \leq n$ and the agents are connected by a directed graph. In this paper, we obtain convergence results for the gradient-push algorithm with constant stepsize whose range is sharp in terms the order of the smoothness constant $L>0$. Precisely, under the two settings: 1) Each local cost $f_i$ is strongly convex and $L$-smooth, 2) Each local cost $f_i$ is convex quadratic and $L$-smooth while the aggregate cost $f$ is strongly convex, we show that the gradient-push algorithm with stepsize $α>0$ converges to an $O(α)$-neighborhood of the minimizer of $f$ for a range $α\in (0, c/L]$ with a value $c>0$ independent of $L>0$. As a benefit of the result, we suggest a hybrid algorithm that performs the gradient-push algorithm with a relatively large stepsize $α>0$ for a number of iterations and then go over to perform the Push-DIGing algorithm. It is verified by a numerical test that the hybrid algorithm enhances the performance of the Push-DIGing algorithm significantly. The convergence results of the gradient-push algorithm are also supported by numerical tests.

Convergence result for the gradient-push algorithm and its application to boost up the Push-DIging algorithm

TL;DR

The paper addresses distributed optimization over directed graphs using the gradient-push algorithm with a constant stepsize. It introduces a contraction-based analysis via the operator

and proves that for

the iterates converge linearly to a fixed point

, achieving an

-neighborhood of the global minimizer

. Two function classes are handled: (i) each

-strongly convex and

-smooth, and (ii) each

is convex quadratic with an

-smooth aggregate, both yielding an

-accurate limit with

independent of

. A hybrid scheme combining gradient-push with Push-DIGing is proposed to accelerate convergence, and numerical experiments demonstrate substantial performance gains. Overall, the work provides sharp, scalable convergence guarantees for constant-step-size distributed optimization on directed graphs and offers a practical pathway to faster consensus-based learning in networked systems.

Abstract

is only known to agent

for

and the agents are connected by a directed graph. In this paper, we obtain convergence results for the gradient-push algorithm with constant stepsize whose range is sharp in terms the order of the smoothness constant

. Precisely, under the two settings: 1) Each local cost

is strongly convex and

-smooth, 2) Each local cost

is convex quadratic and

-smooth while the aggregate cost

is strongly convex, we show that the gradient-push algorithm with stepsize

converges to an

-neighborhood of the minimizer of

for a range

with a value

independent of

. As a benefit of the result, we suggest a hybrid algorithm that performs the gradient-push algorithm with a relatively large stepsize

for a number of iterations and then go over to perform the Push-DIGing algorithm. It is verified by a numerical test that the hybrid algorithm enhances the performance of the Push-DIGing algorithm significantly. The convergence results of the gradient-push algorithm are also supported by numerical tests.

Paper Structure (11 sections, 18 theorems, 124 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 18 theorems, 124 equations, 7 figures, 1 table, 1 algorithm.

Introduction
New results for the gradient-push algorithm with constant stepsize
Combining the gradient-push with the Push-DIging
Convergence results
Properties of $T_{\alpha}$
Convergence of the gradient-push algorithm to the fixed point of $T_{\alpha}$
Estimate for the distance between $w^\alpha$ and $x_*$
Simulation
Contraction property of the mapping $T_{\alpha}$
Convergence results for the strongly convex case (Case 1)
Convergence results for the convex case (Case 2)

Key Result

Theorem 1.2

Suppose one of the following conditions holds true: Then for each $1\leq k \leq n$, the sequence $\{z_k (t)\}_{t \geq 0}$ of the gradient-push algorithm eq-1-2 with stepsize $\alpha \in (0,\alpha_0]$ converges linearly to an $O(\alpha)$-neighborhood of the minimizer of $f$.

Figures (7)

Figure 1: The flow of the main theorems for proving the convergence of the gradient-push algorithm \ref{['eq-1-2']}
Figure 2: Left : The graphs of $\log_{10}(\sum_{k=1}^n \|z_k (t)-x_*\|)$ for the gradient-push ($\alpha_0 = 0.0297$). Right : The graphs of $\log_{10}(\sum_{k=1}^n \|z_k (t)-x_*\|)$ for the Push-DIGing ($\alpha_1 = 0.001175$).
Figure 3: The graph of $\mathcal{L}_{\alpha}$ with respect to $\alpha \in (0,2\alpha_0]$. Left: Case 1. Right: Case 2.
Figure 4: Left: The graph of $\log_{10}\Vert w_{\alpha}(t) - w^\alpha \Vert_{\pi\otimes1_d}$ for least square problem with stepsize $\alpha$ = $\alpha_0$, itertation $t=1,\cdots,1000$. Right: The graph of the error $\Vert w^\alpha-n\pi \otimes x_* \Vert_{\pi\otimes 1_d}$ for stepsizes $\alpha \in (0,\alpha_0]$.
Figure 5: Both graphs show the $\log_{10}\|w_{\alpha}(t) - n\pi \otimes x_*\|_{\pi \otimes 1_d}$ for regularized least square problem with various stepsizes. Left: Cases of the convergent results. Right: Case of the divergent result.
...and 2 more figures

Theorems & Definitions (38)

Definition 1.1
Theorem 1.2
Definition 1.3
Theorem 2.1: CKY2
Remark 2.2
Theorem 2.3
Theorem 2.4
Theorem 2.5
Theorem 2.6
Lemma 3.1
...and 28 more

Convergence result for the gradient-push algorithm and its application to boost up the Push-DIging algorithm

TL;DR

Abstract

Convergence result for the gradient-push algorithm and its application to boost up the Push-DIging algorithm

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (38)