Decentralized Learning Made Easy with DecentralizePy
Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic
TL;DR
DecentralizePy tackles the challenge of evaluating decentralized learning at scale by providing a modular, open-source framework that supports peer-to-peer training, dynamic topologies, and realistic networking. It enables rapid prototyping and deployment through one-node-one-process design, dynamic topology management, and pluggable modules for datasets, models, and communication strategies. The paper demonstrates the framework across topology dynamics, sparsification, and secure aggregation, revealing that dynamic topologies can achieve near-fully-connected accuracy with far lower communication, while sparsification performance degrades under non-IID data and large node counts; secure aggregation introduces modest overhead but preserves privacy. These results highlight DecentralizePy’s potential to accelerate practical DL research and inform design choices for scalable, privacy-preserving distributed learning systems in real-world networks.
Abstract
Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance. It consists of many nodes that coordinate without a central server and exchange millions of parameters in the inherently iterative process of machine learning (ML) training. In addition, these nodes are connected in complex and potentially dynamic topologies. Assessing the intricate dynamics of such networks is clearly not an easy task. Often in literature, researchers resort to simulated environments that do not scale and fail to capture practical and crucial behaviors, including the ones associated to parallelism, data transfer, network delays, and wall-clock time. In this paper, we propose DecentralizePy, a distributed framework for decentralized ML, which allows for the emulation of large-scale learning networks in arbitrary topologies. We demonstrate the capabilities of DecentralizePy by deploying techniques such as sparsification and secure aggregation on top of several topologies, including dynamic networks with more than one thousand nodes.
