Table of Contents
Fetching ...

DRACO: Decentralized Asynchronous Federated Learning over Row-Stochastic Wireless Networks

Eunjeong Jeong, Marios Kountouris

TL;DR

This paper introduces DRACO, a novel framework for decentralized asynchronous Stochastic Gradient Descent (SGD) over row-stochastic gossip wireless networks and shows that DRACO achieves high performance in decentralized optimization while maintaining low variance across users even without predefined scheduling policies.

Abstract

Recent developments and emerging use cases, such as smart Internet of Things (IoT) and Edge AI, have sparked considerable interest in the training of neural networks over fully decentralized (serverless) networks. One of the major challenges of decentralized learning is to ensure stable convergence without resorting to strong assumptions applied for each agent regarding data distributions or updating policies. To address these issues, we propose DRACO, a novel method for decentralized asynchronous Stochastic Gradient Descent (SGD) over row-stochastic gossip wireless networks by leveraging continuous communication. Our approach enables edge devices within decentralized networks to perform local training and model exchanging along a continuous timeline, thereby eliminating the necessity for synchronized timing. The algorithm also features a specific technique of decoupling communication and computation schedules, which empowers complete autonomy for all users and manageable instructions for stragglers. Through a comprehensive convergence analysis, we highlight the advantages of asynchronous and autonomous participation in decentralized optimization. Our numerical experiments corroborate the efficacy of the proposed technique.

DRACO: Decentralized Asynchronous Federated Learning over Row-Stochastic Wireless Networks

TL;DR

This paper introduces DRACO, a novel framework for decentralized asynchronous Stochastic Gradient Descent (SGD) over row-stochastic gossip wireless networks and shows that DRACO achieves high performance in decentralized optimization while maintaining low variance across users even without predefined scheduling policies.

Abstract

Recent developments and emerging use cases, such as smart Internet of Things (IoT) and Edge AI, have sparked considerable interest in the training of neural networks over fully decentralized (serverless) networks. One of the major challenges of decentralized learning is to ensure stable convergence without resorting to strong assumptions applied for each agent regarding data distributions or updating policies. To address these issues, we propose DRACO, a novel method for decentralized asynchronous Stochastic Gradient Descent (SGD) over row-stochastic gossip wireless networks by leveraging continuous communication. Our approach enables edge devices within decentralized networks to perform local training and model exchanging along a continuous timeline, thereby eliminating the necessity for synchronized timing. The algorithm also features a specific technique of decoupling communication and computation schedules, which empowers complete autonomy for all users and manageable instructions for stragglers. Through a comprehensive convergence analysis, we highlight the advantages of asynchronous and autonomous participation in decentralized optimization. Our numerical experiments corroborate the efficacy of the proposed technique.
Paper Structure (17 sections, 8 theorems, 40 equations, 9 figures, 2 algorithms)

This paper contains 17 sections, 8 theorems, 40 equations, 9 figures, 2 algorithms.

Key Result

Lemma 4.1

(Deviation of local gradients) When $N>4$, for all $\mathbf{x},\ t$,

Figures (9)

  • Figure 1: A schematic view of DRACO's timelines with comparisons. (a) Synchronous FL; (b) asynchronous FL with transmission delay deadline; (c) (in DRACO) fully asynchronous FL with delay deadline, but the iteration count is continuous; (d) sequential computation and communication over a doubly stochastic network; (e) timelines of DRACO with decoupled computation and communication over a row-stochastic network. If two messages arrive at the same agent with a negligibly small time gap (in red circle), they are considered simultaneous and are used for the same model aggregation step. The concept of superposition window is elaborated in Section \ref{['subsect:commsys']}.
  • Figure 2: The proposed algorithm (DRACO) in a chain graph illustrating the states of possible actions within each agent $i$.
  • Figure 3: Performance comparison with the literature under (a) EMNIST dataset, and (b) Poker hand dataset.
  • Figure 4: Results for different upper bounds on the number of received messages per user. ($\Gamma_\text{max}=10$)
  • Figure : User-centric algorithm of DRACO. A pseudo-algorithm for source code reproduction is provided in Appendix \ref{['appx:psuedo-alg']}.
  • ...and 4 more figures

Theorems & Definitions (16)

  • Definition 1
  • Lemma 4.1
  • proof
  • Theorem 1
  • Proposition A.1
  • proof
  • Proposition A.2
  • proof
  • Proposition A.3
  • proof
  • ...and 6 more