Table of Contents
Fetching ...

Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks

Daniel Pérez Herrera, Zheng Chen, Erik G. Larsson

TL;DR

This work tackles the challenge of accelerating decentralized learning over wireless networks by exploiting the broadcast nature of wireless channels. It introduces BASS, a forwarding-leaning framework that constructs a family of sparse subgraphs and corresponding mixing matrices, then randomly samples subgraphs with optimized probabilities to minimize the consensus deviation under a given communication budget. The method combines collision-free node scheduling with SDP-based optimization of both subgraph weights and sampling probabilities, and it extends to directed graphs and simplified heuristics for scalability. Empirical results on MNIST and CIFAR-10 show that BASS achieves faster convergence with fewer transmission slots compared to link-based scheduling baselines like MATCHA and LMS, validating the benefits of broadcast-assisted spatial reuse. The work highlights the practical impact of aligning communication topology design with consensus-based learning in wireless networks and points to future work on data heterogeneity and adaptive budget allocation.

Abstract

Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely adopted algorithm for decentralized training of machine learning models across networked agents. A crucial part of D-SGD is the consensus-based model averaging, which heavily relies on information exchange and fusion among the nodes. Specifically, for consensus averaging over wireless networks, communication coordination is necessary to determine when and how a node can access the channel and transmit (or receive) information to (or from) its neighbors. In this work, we propose $\texttt{BASS}$, a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD while considering the actual communication cost per iteration. $\texttt{BASS}$ creates a set of mixing matrix candidates that represent sparser subgraphs of the base topology. In each consensus iteration, one mixing matrix is sampled, leading to a specific scheduling decision that activates multiple collision-free subsets of nodes. The sampling occurs in a probabilistic manner, and the elements of the mixing matrices, along with their sampling probabilities, are jointly optimized. Simulation results demonstrate that $\texttt{BASS}$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods. In conclusion, the inherent broadcasting nature of wireless channels offers intrinsic advantages in accelerating the convergence of decentralized optimization and learning.

Faster Convergence with Less Communication: Broadcast-Based Subgraph Sampling for Decentralized Learning over Wireless Networks

TL;DR

This work tackles the challenge of accelerating decentralized learning over wireless networks by exploiting the broadcast nature of wireless channels. It introduces BASS, a forwarding-leaning framework that constructs a family of sparse subgraphs and corresponding mixing matrices, then randomly samples subgraphs with optimized probabilities to minimize the consensus deviation under a given communication budget. The method combines collision-free node scheduling with SDP-based optimization of both subgraph weights and sampling probabilities, and it extends to directed graphs and simplified heuristics for scalability. Empirical results on MNIST and CIFAR-10 show that BASS achieves faster convergence with fewer transmission slots compared to link-based scheduling baselines like MATCHA and LMS, validating the benefits of broadcast-assisted spatial reuse. The work highlights the practical impact of aligning communication topology design with consensus-based learning in wireless networks and points to future work on data heterogeneity and adaptive budget allocation.

Abstract

Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely adopted algorithm for decentralized training of machine learning models across networked agents. A crucial part of D-SGD is the consensus-based model averaging, which heavily relies on information exchange and fusion among the nodes. Specifically, for consensus averaging over wireless networks, communication coordination is necessary to determine when and how a node can access the channel and transmit (or receive) information to (or from) its neighbors. In this work, we propose , a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD while considering the actual communication cost per iteration. creates a set of mixing matrix candidates that represent sparser subgraphs of the base topology. In each consensus iteration, one mixing matrix is sampled, leading to a specific scheduling decision that activates multiple collision-free subsets of nodes. The sampling occurs in a probabilistic manner, and the elements of the mixing matrices, along with their sampling probabilities, are jointly optimized. Simulation results demonstrate that enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods. In conclusion, the inherent broadcasting nature of wireless channels offers intrinsic advantages in accelerating the convergence of decentralized optimization and learning.
Paper Structure (36 sections, 1 theorem, 28 equations, 10 figures, 1 table, 3 algorithms)

This paper contains 36 sections, 1 theorem, 28 equations, 10 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

wang2022matcha Under assumptions $(1-5)$, if the learning rate satisfies $\eta l\leq \min\{1, (\sqrt{\rho^{-1}}-1)/4\}$, where $\rho:= \left\|\mathbb{E}[\mathbf{W}^{\top}(t)\mathbf{W}(t)]-\mathbf{J}\right\|_2$, then after $T$ iterations: where $\bar{\mathbf{x}}(t)=\frac{1}{N}\sum_{i=1}^{N}\mathbf{x}_i(t)$, $\bar{\mathbf{x}}(1)$ is the average of the initial parameter vector, $F_{\text{inf}}$ is a

Figures (10)

  • Figure 1: Timeline of the training process. Each iteration consists of one computation phase and one communication round. Multiple transmission slots might be consumed in each communication round, depending on the number of scheduled nodes/links.
  • Figure 2: (a) Partition of the base graph into collision-free subsets, where different colors represent different collision-free subsets (b) Subgraph candidates for a communication budget of $\mathcal{B}=4$. (c) Optimization of the mixing matrix candidates and their sampling probabilities. (d) Example of sampled subgraphs per iteration.
  • Figure 3: Performance comparison between optimized and heuristic $\texttt{BASS}$, modified $\texttt{MATCHA}$, modified $\texttt{LMS}$, and full communication with different network typologies. .
  • Figure 4: Collision-free subsets and subgraph sampling visualization for the topology in Fig. \ref{['exp_fig']}(b). The communication budget is $\mathcal{B}=4$, corresponding to the $50\%$ of the total number of collision-free subsets.
  • Figure 5: Impact of the communication budget in the performance of $\texttt{BASS}$.
  • ...and 5 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Definition 1
  • Remark 1
  • Remark 2