Table of Contents
Fetching ...

Towards Practical Overlay Networks for Decentralized Federated Learning

Yifan Hua, Jinlong Pang, Xiaoxue Zhang, Yi Liu, Xiaofeng Shi, Bao Wang, Yang Liu, Chen Qian

TL;DR

FedLay introduces a fully decentralized overlay for Decentralized Federated Learning that achieves fast model convergence, high accuracy, and low communication with resilience to node churn. It constructs near-random regular topologies via multiple virtual ring spaces and greedy routing, enabling decentralized neighbor discovery and maintenance without a central server. A two-part protocol stack combines Neighbor Discovery and Maintenance Protocols (NDMP) for topology upkeep with a Model Exchange Protocol (MEP) that uses confidence-weighted asynchronous exchanges and model fingerprinting to mitigate low-quality transmissions. Empirical results on real deployments, emulations, and simulations demonstrate FedLay outperforms existing DFL overlays in convergence speed, accuracy, and resilience while maintaining modest communication costs, making practical DFL with decentralized topology feasible.

Abstract

Decentralized federated learning (DFL) uses peer-to-peer communication to avoid the single point of failure problem in federated learning and has been considered an attractive solution for machine learning tasks on distributed devices. We provide the first solution to a fundamental network problem of DFL: what overlay network should DFL use to achieve fast training of highly accurate models, low communication, and decentralized construction and maintenance? Overlay topologies of DFL have been investigated, but no existing DFL topology includes decentralized protocols for network construction and topology maintenance. Without these protocols, DFL cannot run in practice. This work presents an overlay network, called FedLay, which provides fast training and low communication cost for practical DFL. FedLay is the first solution for constructing near-random regular topologies in a decentralized manner and maintaining the topologies under node joins and failures. Experiments based on prototype implementation and simulations show that FedLay achieves the fastest model convergence and highest accuracy on real datasets compared to existing DFL solutions while incurring small communication costs and being resilient to node joins and failures.

Towards Practical Overlay Networks for Decentralized Federated Learning

TL;DR

FedLay introduces a fully decentralized overlay for Decentralized Federated Learning that achieves fast model convergence, high accuracy, and low communication with resilience to node churn. It constructs near-random regular topologies via multiple virtual ring spaces and greedy routing, enabling decentralized neighbor discovery and maintenance without a central server. A two-part protocol stack combines Neighbor Discovery and Maintenance Protocols (NDMP) for topology upkeep with a Model Exchange Protocol (MEP) that uses confidence-weighted asynchronous exchanges and model fingerprinting to mitigate low-quality transmissions. Empirical results on real deployments, emulations, and simulations demonstrate FedLay outperforms existing DFL overlays in convergence speed, accuracy, and resilience while maintaining modest communication costs, making practical DFL with decentralized topology feasible.

Abstract

Decentralized federated learning (DFL) uses peer-to-peer communication to avoid the single point of failure problem in federated learning and has been considered an attractive solution for machine learning tasks on distributed devices. We provide the first solution to a fundamental network problem of DFL: what overlay network should DFL use to achieve fast training of highly accurate models, low communication, and decentralized construction and maintenance? Overlay topologies of DFL have been investigated, but no existing DFL topology includes decentralized protocols for network construction and topology maintenance. Without these protocols, DFL cannot run in practice. This work presents an overlay network, called FedLay, which provides fast training and low communication cost for practical DFL. FedLay is the first solution for constructing near-random regular topologies in a decentralized manner and maintaining the topologies under node joins and failures. Experiments based on prototype implementation and simulations show that FedLay achieves the fastest model convergence and highest accuracy on real datasets compared to existing DFL solutions while incurring small communication costs and being resilient to node joins and failures.
Paper Structure (30 sections, 3 theorems, 4 equations, 15 figures, 3 tables)

This paper contains 30 sections, 3 theorems, 4 equations, 15 figures, 3 tables.

Key Result

Lemma 1

In a ring space of a correct FedLay network and a given coordinate $x$, if a node $v$ is not the node that has the smallest circular distance to $x$ in the space, then $v$ must have an adjacent node $w$ on the ring such that $CD(x, x^v)>CD(x, x^{w})$, where $x^v$ is $v$'s coordinate.

Figures (15)

  • Figure 1: Federated learning v.s. decentralized federated learning
  • Figure 2: An example of FedLay topology
  • Figure 3: Comparisons of network topologies on the three metrics discussed in Sec. \ref{['sec:metrics']}.
  • Figure 4: FedLay protocol suite includes two sets of protocols: 1) Neighbor Discovery and Maintenance Protocols; 2) Model Exchange Protocol.
  • Figure 5: An example of the FedLay $\mathtt{join}$ protocol.
  • ...and 10 more figures

Theorems & Definitions (7)

  • Definition 1: A correct FedLay overlay
  • Definition 2: Circular distance
  • Lemma 1
  • proof
  • Theorem 1
  • Theorem 2
  • proof