Table of Contents
Fetching ...

Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity

Alexander Tyurin, Marta Pozzi, Ivan Ilin, Peter Richtárik

TL;DR

Using an unbiassed compression technique, a new method-Shadowheart SGD- is developed that provably improves the time complexities of all previous centralized methods and is shown that the time complexity of Shadowheart SGD is optimal in the family of centralized methods with compressed communication.

Abstract

We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers. Using an unbiassed compression technique, we develop a new method-Shadowheart SGD-that provably improves the time complexities of all previous centralized methods. Moreover, we show that the time complexity of Shadowheart SGD is optimal in the family of centralized methods with compressed communication. We also consider the bidirectional setup, where broadcasting from the server to the workers is non-negligible, and develop a corresponding method.

Shadowheart SGD: Distributed Asynchronous SGD with Optimal Time Complexity Under Arbitrary Computation and Communication Heterogeneity

TL;DR

Using an unbiassed compression technique, a new method-Shadowheart SGD- is developed that provably improves the time complexities of all previous centralized methods and is shown that the time complexity of Shadowheart SGD is optimal in the family of centralized methods with compressed communication.

Abstract

We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers. Using an unbiassed compression technique, we develop a new method-Shadowheart SGD-that provably improves the time complexities of all previous centralized methods. Moreover, we show that the time complexity of Shadowheart SGD is optimal in the family of centralized methods with compressed communication. We also consider the bidirectional setup, where broadcasting from the server to the workers is non-negligible, and develop a corresponding method.
Paper Structure (44 sections, 30 theorems, 294 equations, 9 figures, 1 table, 9 algorithms)

This paper contains 44 sections, 30 theorems, 294 equations, 9 figures, 1 table, 9 algorithms.

Key Result

Theorem 4.1

Lett Assumptions ass:lipschitz_constant, ass:lower_bound, ass:stochastic_variance_bounded, ass:independence hold. Let us take $\gamma = 1 / 2 L$ in Shadowheart SGD (Alg. alg:alg_server). Then as long as $K \geq 16 L \Delta / \varepsilon,$ we have the guarantee $\frac{1}{K}\sum_{k=0}^{K-1}{\rm \mathb

Figures (9)

  • Figure 1: SGDone starts to slow down relative to Shadowheart SGD and other methods when we increase the noise.
  • Figure 2: The non-compressed methods Asynchronous SGD and Minibatch SGD slow down relative to Shadowheart SGD when we increase the communication times.
  • Figure 3: Shadowheart SGD improves when we decrease the computation times from $\sqrt{i}$ to $1.$
  • Figure 4: $h_i^k, \dot{\tau}_i^k \sim U(0.1,1)$
  • Figure 5: $h_i^k, \dot{\tau}_i^k \sim U(0.1,1)$
  • ...and 4 more figures

Theorems & Definitions (89)

  • Definition 2.1
  • Definition 3.1: Equilibrium Time
  • Theorem 4.1
  • Corollary 4.1
  • Theorem 4.2
  • Definition 4.3
  • Corollary 4.3
  • Example 6.0
  • Example 6.0
  • Example 6.0
  • ...and 79 more