Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks
Daniel Pérez Herrera, Zheng Chen, Erik G. Larsson
TL;DR
This work tackles the challenge of accelerating decentralized learning over wireless networks by exploiting the broadcast nature of wireless channels. It introduces BASS, a forwarding-leaning framework that constructs a family of sparse subgraphs and corresponding mixing matrices, then randomly samples subgraphs with optimized probabilities to minimize the consensus deviation under a given communication budget. The method combines collision-free node scheduling with SDP-based optimization of both subgraph weights and sampling probabilities, and it extends to directed graphs and simplified heuristics for scalability. Empirical results on MNIST and CIFAR-10 show that BASS achieves faster convergence with fewer transmission slots compared to link-based scheduling baselines like MATCHA and LMS, validating the benefits of broadcast-assisted spatial reuse. The work highlights the practical impact of aligning communication topology design with consensus-based learning in wireless networks and points to future work on data heterogeneity and adaptive budget allocation.
Abstract
Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely adopted algorithm for decentralized training of machine learning models across networked agents. A crucial part of D-SGD is the consensus-based model averaging, which heavily relies on information exchange and fusion among the nodes. Specifically, for consensus averaging over wireless networks, communication coordination is necessary to determine when and how a node can access the channel and transmit (or receive) information to (or from) its neighbors. In this work, we propose $\texttt{BASS}$, a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD while considering the actual communication cost per iteration. $\texttt{BASS}$ creates a set of mixing matrix candidates that represent sparser subgraphs of the base topology. In each consensus iteration, one mixing matrix is sampled, leading to a specific scheduling decision that activates multiple collision-free subsets of nodes. The sampling occurs in a probabilistic manner, and the elements of the mixing matrices, along with their sampling probabilities, are jointly optimized. Simulation results demonstrate that $\texttt{BASS}$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods. In conclusion, the inherent broadcasting nature of wireless channels offers intrinsic advantages in accelerating the convergence of decentralized optimization and learning.
