Table of Contents
Fetching ...

Overlay-based Decentralized Federated Learning in Bandwidth-limited Networks

Yudi Huang, Tingyang Sun, Ting He

TL;DR

This work addresses the communication demands and the communication schedule for overlay-based DFL in bandwidth-limited networks without requiring explicit cooperation from the underlying network by leveraging recent advances in network tomography.

Abstract

The emerging machine learning paradigm of decentralized federated learning (DFL) has the promise of greatly boosting the deployment of artificial intelligence (AI) by directly learning across distributed agents without centralized coordination. Despite significant efforts on improving the communication efficiency of DFL, most existing solutions were based on the simplistic assumption that neighboring agents are physically adjacent in the underlying communication network, which fails to correctly capture the communication cost when learning over a general bandwidth-limited network, as encountered in many edge networks. In this work, we address this gap by leveraging recent advances in network tomography to jointly design the communication demands and the communication schedule for overlay-based DFL in bandwidth-limited networks without requiring explicit cooperation from the underlying network. By carefully analyzing the structure of our problem, we decompose it into a series of optimization problems that can each be solved efficiently, to collectively minimize the total training time. Extensive data-driven simulations show that our solution can significantly accelerate DFL in comparison with state-of-the-art designs.

Overlay-based Decentralized Federated Learning in Bandwidth-limited Networks

TL;DR

This work addresses the communication demands and the communication schedule for overlay-based DFL in bandwidth-limited networks without requiring explicit cooperation from the underlying network by leveraging recent advances in network tomography.

Abstract

The emerging machine learning paradigm of decentralized federated learning (DFL) has the promise of greatly boosting the deployment of artificial intelligence (AI) by directly learning across distributed agents without centralized coordination. Despite significant efforts on improving the communication efficiency of DFL, most existing solutions were based on the simplistic assumption that neighboring agents are physically adjacent in the underlying communication network, which fails to correctly capture the communication cost when learning over a general bandwidth-limited network, as encountered in many edge networks. In this work, we address this gap by leveraging recent advances in network tomography to jointly design the communication demands and the communication schedule for overlay-based DFL in bandwidth-limited networks without requiring explicit cooperation from the underlying network. By carefully analyzing the structure of our problem, we decompose it into a series of optimization problems that can each be solved efficiently, to collectively minimize the total training time. Extensive data-driven simulations show that our solution can significantly accelerate DFL in comparison with state-of-the-art designs.
Paper Structure (36 sections, 7 theorems, 37 equations, 15 figures, 5 tables, 1 algorithm)

This paper contains 36 sections, 7 theorems, 37 equations, 15 figures, 5 tables, 1 algorithm.

Key Result

Lemma 3.1

Given a feasible routing solution $\bm{z}$ to eq:min-time, define as the number of activated unicast flowsAlthough the logical demands is a set of multicast flows as in eq:demands H, the multicast operations can only be performed at overlay nodes, according to logical multicast trees formed by overlay links. Each hop in such a tree, corresponding to some $z^h_{ij} achieved by equally sharing the

Figures (15)

  • Figure 1: Overlay-based DFL.
  • Figure 2: Underlay-aware communication schedule optimization (learning agents: $\{A,B,C,D\}$; underlay nodes: $\{h_1,h_2\}$).
  • Figure 3: Challenge for in-overlay aggregation (learning agents: $\{A,B,C,D,E\}$; underlay nodes: $\{h_1,h_2\}$).
  • Figure 4: Workflow of overall solution.
  • Figure 5: Underlay network topologies.
  • ...and 10 more figures

Theorems & Definitions (14)

  • Lemma 3.1
  • Definition 1: Huang23MobiHoc
  • Lemma 3.2
  • Theorem 3.3
  • Corollary 3.4
  • Lemma 3.5
  • Lemma 3.6
  • Theorem 3.7
  • proof : Proof of Lemma \ref{['lem:equal bandwidth allocation']}
  • proof : Proof of Lemma \ref{['lem:equal bandwidth allocation - category']}
  • ...and 4 more