Table of Contents
Fetching ...

Bandwidth-Aware Network Topology Optimization for Decentralized Learning

Yipeng Shen, Zehan Zhu, Yan Huang, Changzhi Yan, Cheng Zhuo, Jinming Xu

TL;DR

The paper tackles the inefficiency of decentralized learning caused by fixed, bandwidth-agnostic topologies. It formulates a bandwidth-aware topology design as a constrained optimization, using edge-cardinality constraints and a unified M/e-based representation to handle homogeneous and heterogeneous bandwidths, and solves it via an ADMM-based Mixed-Integer SDP framework with conjugate-gradient acceleration. Key contributions include a bandwidth-aware edge-allocation scheme, a versatile modeling of heterogeneity (node, intra-server, inter-switch), and scalable computation enabling hundreds of nodes. Empirical results show BA-Topo consistently surpasses benchmarks in consensus speed and accelerates decentralized SGD on real datasets, with substantial speedups across scenarios, validating practical impact for efficient distributed training.

Abstract

Network topology is critical for efficient parameter synchronization in distributed learning over networks. However, most existing studies do not account for bandwidth limitations in network topology design. In this paper, we propose a bandwidth-aware network topology optimization framework to maximize consensus speed under edge cardinality constraints. For heterogeneous bandwidth scenarios, we introduce a maximum bandwidth allocation strategy for the edges to ensure efficient communication among nodes. By reformulating the problem into an equivalent Mixed-Integer SDP problem, we leverage a computationally efficient ADMM-based method to obtain topologies that yield the maximum consensus speed. Within the ADMM substep, we adopt the conjugate gradient method to efficiently solve large-scale linear equations to achieve better scalability. Experimental results demonstrate that the resulting network topologies outperform the benchmark topologies in terms of consensus speed, and reduce the training time required for decentralized learning tasks on real-world datasets to achieve the target test accuracy, exhibiting speedups of more than $1.11\times$ and $1.21\times$ for homogeneous and heterogeneous bandwidth settings, respectively.

Bandwidth-Aware Network Topology Optimization for Decentralized Learning

TL;DR

The paper tackles the inefficiency of decentralized learning caused by fixed, bandwidth-agnostic topologies. It formulates a bandwidth-aware topology design as a constrained optimization, using edge-cardinality constraints and a unified M/e-based representation to handle homogeneous and heterogeneous bandwidths, and solves it via an ADMM-based Mixed-Integer SDP framework with conjugate-gradient acceleration. Key contributions include a bandwidth-aware edge-allocation scheme, a versatile modeling of heterogeneity (node, intra-server, inter-switch), and scalable computation enabling hundreds of nodes. Empirical results show BA-Topo consistently surpasses benchmarks in consensus speed and accelerates decentralized SGD on real datasets, with substantial speedups across scenarios, validating practical impact for efficient distributed training.

Abstract

Network topology is critical for efficient parameter synchronization in distributed learning over networks. However, most existing studies do not account for bandwidth limitations in network topology design. In this paper, we propose a bandwidth-aware network topology optimization framework to maximize consensus speed under edge cardinality constraints. For heterogeneous bandwidth scenarios, we introduce a maximum bandwidth allocation strategy for the edges to ensure efficient communication among nodes. By reformulating the problem into an equivalent Mixed-Integer SDP problem, we leverage a computationally efficient ADMM-based method to obtain topologies that yield the maximum consensus speed. Within the ADMM substep, we adopt the conjugate gradient method to efficiently solve large-scale linear equations to achieve better scalability. Experimental results demonstrate that the resulting network topologies outperform the benchmark topologies in terms of consensus speed, and reduce the training time required for decentralized learning tasks on real-world datasets to achieve the target test accuracy, exhibiting speedups of more than and for homogeneous and heterogeneous bandwidth settings, respectively.

Paper Structure

This paper contains 26 sections, 1 theorem, 33 equations, 10 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

For a Laplace matrix $L$ with eigenvalues satisfying Eq. tab:eig-of-laplacian-matrix, if $\alpha \geqslant \lambda _{n-1}\left( L \right)$, then $L+\frac{\alpha \mathbf{1}\mathbf{1}^ T}{n}\succcurlyeq \lambda _{n-1}\left( L \right) I$.

Figures (10)

  • Figure 1: Comparison of consensus speed among various topologies with $n=16$ in homogeneous bandwidth scenario.
  • Figure 2: Comparison of consensus speed among various topologies with $n=16$ in node-level bandwidth heterogeneity scenario.
  • Figure 3: Standard server architecture
  • Figure 4: Comparison of consensus speed among various topologies with $n=8$ in intra-server link bandwidth heterogeneity scenario.
  • Figure 5: BCube topology structure for $p=4$ and $k=2$.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Lemma 1: Proposition 1 in dai2011optimal
  • Remark 1
  • Remark 2