A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

Antai Xie; Xinlei Yi; Xiaofan Wang; Ming Cao; Xiaoqiang Ren

A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

Antai Xie, Xinlei Yi, Xiaofan Wang, Ming Cao, Xiaoqiang Ren

TL;DR

This paper proposes a distributed stochastic gradient descent algorithm, suitable for a general class of compressors, and shows that the proposed algorithm achieves the linear speedup convergence rate of $\mathcal{O}(-1/\sqrt{nT})$ for smooth nonconvex functions.

Abstract

This paper studies distributed nonconvex optimization problems with stochastic gradients for a multi-agent system, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed information exchange. We propose a distributed stochastic gradient descent (SGD) algorithm, suitable for a general class of compressors. We show that the proposed algorithm achieves the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ for smooth nonconvex functions, where $T$ and $n$ are the number of iterations and agents, respectively. If the global cost function additionally satisfies the Polyak--Łojasiewicz condition, the proposed algorithm can linearly converge to a neighborhood of the global optimum, regardless of whether the stochastic gradient is unbiased or not. Numerical experiments are carried out to verify the efficiency of our algorithm.

A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

TL;DR

for smooth nonconvex functions.

Abstract

for smooth nonconvex functions, where

and

are the number of iterations and agents, respectively. If the global cost function additionally satisfies the Polyak--Łojasiewicz condition, the proposed algorithm can linearly converge to a neighborhood of the global optimum, regardless of whether the stochastic gradient is unbiased or not. Numerical experiments are carried out to verify the efficiency of our algorithm.

Paper Structure (17 sections, 9 theorems, 64 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 9 theorems, 64 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Preliminaries and Problem Formulation
Distributed Optimization
Graph Theory
Assumptions
Compression Method
Compressed Primal--Dual SGD Algorithm
Algorithm Description
Convergence Analysis of CP-SGD
simulation
conclusion
Supporting Lemmas
The proof of Theorem \ref{['theo:convergence1']}
Notations and useful lemma
The proof of Theorem \ref{['theo:convergence1']}
...and 2 more sections

Key Result

Theorem 1

Suppose Assumptions as:strongconnected--as:boundedvar and as:compressor hold and in Algorithm Al:CP-SGD, let $\gamma_k=\beta_1\omega_k, ~\eta_k=\frac{\beta_2}{\omega_k},~\omega_k=\omega>\beta_3$, and $\alpha_x\in(0,\frac{1}{r})$, $\forall k\in\mathbb{N}$ where $~\beta_1>c_0,~\beta_2>0$ with $c_0,\be

Figures (3)

Figure 1: A connected undirected graph consisting of 6 agents.
Figure 2: The evolution of residual under DSGD, Choco-SGD, and CP-SGD.
Figure 3: The evolution of residual with respect to the transmitted bits under DSGD, Choco-SGD, and CP-SGD.

Theorems & Definitions (18)

Remark 1
Remark 2
Theorem 1
proof
Corollary 1
Remark 3
Theorem 2
proof
Remark 4
Theorem 3
...and 8 more

A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

TL;DR

Abstract

A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (18)