Table of Contents
Fetching ...

A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

Antai Xie, Xinlei Yi, Xiaofan Wang, Ming Cao, Xiaoqiang Ren

TL;DR

This paper proposes a distributed stochastic gradient descent algorithm, suitable for a general class of compressors, and shows that the proposed algorithm achieves the linear speedup convergence rate of $\mathcal{O}(-1/\sqrt{nT})$ for smooth nonconvex functions.

Abstract

This paper studies distributed nonconvex optimization problems with stochastic gradients for a multi-agent system, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed information exchange. We propose a distributed stochastic gradient descent (SGD) algorithm, suitable for a general class of compressors. We show that the proposed algorithm achieves the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ for smooth nonconvex functions, where $T$ and $n$ are the number of iterations and agents, respectively. If the global cost function additionally satisfies the Polyak--Łojasiewicz condition, the proposed algorithm can linearly converge to a neighborhood of the global optimum, regardless of whether the stochastic gradient is unbiased or not. Numerical experiments are carried out to verify the efficiency of our algorithm.

A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization

TL;DR

This paper proposes a distributed stochastic gradient descent algorithm, suitable for a general class of compressors, and shows that the proposed algorithm achieves the linear speedup convergence rate of for smooth nonconvex functions.

Abstract

This paper studies distributed nonconvex optimization problems with stochastic gradients for a multi-agent system, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed information exchange. We propose a distributed stochastic gradient descent (SGD) algorithm, suitable for a general class of compressors. We show that the proposed algorithm achieves the linear speedup convergence rate for smooth nonconvex functions, where and are the number of iterations and agents, respectively. If the global cost function additionally satisfies the Polyak--Łojasiewicz condition, the proposed algorithm can linearly converge to a neighborhood of the global optimum, regardless of whether the stochastic gradient is unbiased or not. Numerical experiments are carried out to verify the efficiency of our algorithm.
Paper Structure (17 sections, 9 theorems, 64 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 17 sections, 9 theorems, 64 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Suppose Assumptions as:strongconnected--as:boundedvar and as:compressor hold and in Algorithm Al:CP-SGD, let $\gamma_k=\beta_1\omega_k, ~\eta_k=\frac{\beta_2}{\omega_k},~\omega_k=\omega>\beta_3$, and $\alpha_x\in(0,\frac{1}{r})$, $\forall k\in\mathbb{N}$ where $~\beta_1>c_0,~\beta_2>0$ with $c_0,\be

Figures (3)

  • Figure 1: A connected undirected graph consisting of 6 agents.
  • Figure 2: The evolution of residual under DSGD, Choco-SGD, and CP-SGD.
  • Figure 3: The evolution of residual with respect to the transmitted bits under DSGD, Choco-SGD, and CP-SGD.

Theorems & Definitions (18)

  • Remark 1
  • Remark 2
  • Theorem 1
  • proof
  • Corollary 1
  • Remark 3
  • Theorem 2
  • proof
  • Remark 4
  • Theorem 3
  • ...and 8 more