Get More for Less in Decentralized Learning Systems
Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic, Jeffrey Wigger
TL;DR
This paper tackles the high communication cost of decentralized learning (DL) with large neural networks by introducing JWINS, a wavelet-domain sparsification framework that shares only a subset of parameters under a randomized cut-off. JWINS ranks parameters through a wavelet-based accumulation of model changes, and uses a randomized sharing rate to balance information exchange and network load, complemented by metadata compression. Empirically, JWINS achieves near full-sharing accuracy on non-IID data across multiple tasks while reducing transmitted data by up to $64\%$, and it outperforms CHOCO-SGD by up to $4\times$ in network savings and wall-clock time at low budgets. The results demonstrate JWINS’ scalability to hundreds of nodes, robustness to topology changes, and broad applicability across CNNs, LSTMs, and embeddings, with clear avenues for future theoretical convergence guarantees and adaptive parameter-type ranking.
Abstract
Decentralized learning (DL) systems have been gaining popularity because they avoid raw data sharing by communicating only model parameters, hence preserving data confidentiality. However, the large size of deep neural networks poses a significant challenge for decentralized training, since each node needs to exchange gigabytes of data, overloading the network. In this paper, we address this challenge with JWINS, a communication-efficient and fully decentralized learning system that shares only a subset of parameters through sparsification. JWINS uses wavelet transform to limit the information loss due to sparsification and a randomized communication cut-off that reduces communication usage without damaging the performance of trained models. We demonstrate empirically with 96 DL nodes on non-IID datasets that JWINS can achieve similar accuracies to full-sharing DL while sending up to 64% fewer bytes. Additionally, on low communication budgets, JWINS outperforms the state-of-the-art communication-efficient DL algorithm CHOCO-SGD by up to 4x in terms of network savings and time.
