Decentralized Learning Made Easy with DecentralizePy

Akash Dhasade; Anne-Marie Kermarrec; Rafael Pires; Rishi Sharma; Milos Vujasinovic

Decentralized Learning Made Easy with DecentralizePy

Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic

TL;DR

DecentralizePy tackles the challenge of evaluating decentralized learning at scale by providing a modular, open-source framework that supports peer-to-peer training, dynamic topologies, and realistic networking. It enables rapid prototyping and deployment through one-node-one-process design, dynamic topology management, and pluggable modules for datasets, models, and communication strategies. The paper demonstrates the framework across topology dynamics, sparsification, and secure aggregation, revealing that dynamic topologies can achieve near-fully-connected accuracy with far lower communication, while sparsification performance degrades under non-IID data and large node counts; secure aggregation introduces modest overhead but preserves privacy. These results highlight DecentralizePy’s potential to accelerate practical DL research and inform design choices for scalable, privacy-preserving distributed learning systems in real-world networks.

Abstract

Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance. It consists of many nodes that coordinate without a central server and exchange millions of parameters in the inherently iterative process of machine learning (ML) training. In addition, these nodes are connected in complex and potentially dynamic topologies. Assessing the intricate dynamics of such networks is clearly not an easy task. Often in literature, researchers resort to simulated environments that do not scale and fail to capture practical and crucial behaviors, including the ones associated to parallelism, data transfer, network delays, and wall-clock time. In this paper, we propose DecentralizePy, a distributed framework for decentralized ML, which allows for the emulation of large-scale learning networks in arbitrary topologies. We demonstrate the capabilities of DecentralizePy by deploying techniques such as sparsification and secure aggregation on top of several topologies, including dynamic networks with more than one thousand nodes.

Decentralized Learning Made Easy with DecentralizePy

TL;DR

Abstract

Paper Structure (23 sections, 6 figures)

This paper contains 23 sections, 6 figures.

Introduction
Contributions
DecentralizePy
Design overview
Modularity.
One-node one-process.
Architecture
Node.
Graph.
Model.
Dataset.
Training.
Sharing and Communication
Mapping, Compression, and Utils
Implementation
...and 8 more sections

Figures (6)

Figure 1: Overview of the decentralizepy framework. Each node along with its driver may run on a separate machine. The driver takes as input the specifications and modules. The node dynamically loads the specified modules. Results are dumped locally and later aggregated. Nodes can be specialized to perform different tasks. In conventional DL, we would have only basic Node(s). To emulate FL, a node can be modified to coordinate the training, shown as the FL server.
Figure 2: A Python code snippet to demonstrate a simple DL node using the modules of decentralizepy colored in red. The node repeatedly trains its model on the local dataset (line 6), exchanges the model with the neighbors (lines 7-10), aggregates the models (line 11), and evaluates the average model on the test set (line 12).
Figure 3: Performance of 256-node DL across three topologies and a dynamic 5-regular graph. (a) The denser the topology, the better the accuracy: fully connected > d-regular > ring. They all run for the same number of communication rounds. (b) When considering emulation time, fully connected takes the longest to perform the same number of rounds. (c) Denser topologies incur significantly more communication costs. We observe that d-regular graphs offer a favorable tradeoff between accuracy, communication, and emulation time compared to ring or fully connected. Dynamic d-regular surprisingly matches the convergence of fully connected across time (b) at significantly lower communication cost (c).
Figure 4: Performance of a 256-node DL comparing the sparsification algorithms of random sampling and Choco-SGD to full sharing. The communication budget is set to 10% and we run all algorithms for the same number of communication rounds. We observe that data non-IIDness and the scale of nodes significantly hurt the performance of sparsification algorithms. Under the same communication budget, full sharing tends to be robust and achieves higher accuracy.
Figure 5: Performance of a 48-node DL comparing Secure Aggregation with standard DL without secure aggregation (D-PSGD). We observe that Secure Aggregation achieves comparable accuracy to D-PSGD on the CelebA dataset while it loses 3% absolute accuracy on CIFAR-10 (first row). The privacy guarantees of Secure Aggregation come at a modest cost in additional communication (second row).
...and 1 more figures

Decentralized Learning Made Easy with DecentralizePy

TL;DR

Abstract

Decentralized Learning Made Easy with DecentralizePy

Authors

TL;DR

Abstract

Table of Contents

Figures (6)