DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates
Peyman Gholami, Hulya Seferoglu
TL;DR
The paper tackles the high communication costs and potential slow convergence of decentralized learning by introducing DIGEST, an asynchronous framework that fuses Gossip-like information spreading with occasional global-model exchanges driven by local-SGD. It supports both single-stream and multi-stream modes, enabling a tunable trade-off between convergence speed and communication overhead, and provides convergence guarantees for iid and non-iid data across network topologies. Empirical results on logistic regression and ResNet-20 demonstrate that multi-stream DIGEST often outperforms baselines in non-iid settings while maintaining competitive performance in iid scenarios, with clear speed-up gains. Overall, DIGEST offers a practical, topology-agnostic approach to scalable decentralized learning without a central server.
Abstract
Two widely considered decentralized learning algorithms are Gossip and random walk-based learning. Gossip algorithms (both synchronous and asynchronous versions) suffer from high communication cost, while random-walk based learning experiences increased convergence time. In this paper, we design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST by taking advantage of both Gossip and random-walk ideas, and focusing on stochastic gradient descent (SGD). DIGEST is an asynchronous decentralized algorithm building on local-SGD algorithms, which are originally designed for communication efficient centralized learning. We design both single-stream and multi-stream DIGEST, where the communication overhead may increase when the number of streams increases, and there is a convergence and communication overhead trade-off which can be leveraged. We analyze the convergence of single- and multi-stream DIGEST, and prove that both algorithms approach to the optimal solution asymptotically for both iid and non-iid data distributions. We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20. The simulation results confirm that multi-stream DIGEST has nice convergence properties; i.e., its convergence time is better than or comparable to the baselines in iid setting, and outperforms the baselines in non-iid setting.
