Table of Contents
Fetching ...

Row-stochastic matrices can provably outperform doubly stochastic matrices in decentralized learning

Bing Liu, Boao Kong, Limin Lu, Kun Yuan, Chengcheng Zhao

TL;DR

The paper addresses decentralized learning with heterogeneous node weights by contrasting two designs: embedding weights into local losses (doubly stochastic mixing) versus embedding weights in a row-stochastic mixer. It introduces a weighted Hilbert space $L^2(\lambda;\mathbb{R}^d)$ in which the row-stochastic design exhibits self-adjointness and tighter consensus-error bounds than the non-self-adjoint doubly-stochastic case, yielding convergence guarantees that are strictly stronger than Euclidean analyses. The authors derive explicit spectral-gap–dependent conditions (via Rayleigh quotients and Loewner-order comparisons) under which the row-stochastic strategy converges faster, and they translate these into topology-design guidelines, notably proposing degree-proportional to weight allocations. Experiments on synthetic Least-Squares and CIFAR-10 with ResNet-18 corroborate the theoretical findings, showing that Strategy II consistently outperforms Strategy I across topologies. Overall, the work provides a principled framework and practical topology-design rules for efficient decentralized learning with heterogeneous node contributions.

Abstract

Decentralized learning often involves a weighted global loss with heterogeneous node weights $λ$. We revisit two natural strategies for incorporating these weights: (i) embedding them into the local losses to retain a uniform weight (and thus a doubly stochastic matrix), and (ii) keeping the original losses while employing a $λ$-induced row-stochastic matrix. Although prior work shows that both strategies yield the same expected descent direction for the global loss, it remains unclear whether the Euclidean-space guarantees are tight and what fundamentally differentiates their behaviors. To clarify this, we develop a weighted Hilbert-space framework $L^2(λ;\mathbb{R}^d)$ and obtain convergence rates that are strictly tighter than those from Euclidean analysis. In this geometry, the row-stochastic matrix becomes self-adjoint whereas the doubly stochastic one does not, creating additional penalty terms that amplify consensus error, thereby slowing convergence. Consequently, the difference in convergence arises not only from spectral gaps but also from these penalty terms. We then derive sufficient conditions under which the row-stochastic design converges faster even with a smaller spectral gap. Finally, by using a Rayleigh-quotient and Loewner-order eigenvalue comparison, we further obtain topology conditions that guarantee this advantage and yield practical topology-design guidelines.

Row-stochastic matrices can provably outperform doubly stochastic matrices in decentralized learning

TL;DR

The paper addresses decentralized learning with heterogeneous node weights by contrasting two designs: embedding weights into local losses (doubly stochastic mixing) versus embedding weights in a row-stochastic mixer. It introduces a weighted Hilbert space in which the row-stochastic design exhibits self-adjointness and tighter consensus-error bounds than the non-self-adjoint doubly-stochastic case, yielding convergence guarantees that are strictly stronger than Euclidean analyses. The authors derive explicit spectral-gap–dependent conditions (via Rayleigh quotients and Loewner-order comparisons) under which the row-stochastic strategy converges faster, and they translate these into topology-design guidelines, notably proposing degree-proportional to weight allocations. Experiments on synthetic Least-Squares and CIFAR-10 with ResNet-18 corroborate the theoretical findings, showing that Strategy II consistently outperforms Strategy I across topologies. Overall, the work provides a principled framework and practical topology-design rules for efficient decentralized learning with heterogeneous node contributions.

Abstract

Decentralized learning often involves a weighted global loss with heterogeneous node weights . We revisit two natural strategies for incorporating these weights: (i) embedding them into the local losses to retain a uniform weight (and thus a doubly stochastic matrix), and (ii) keeping the original losses while employing a -induced row-stochastic matrix. Although prior work shows that both strategies yield the same expected descent direction for the global loss, it remains unclear whether the Euclidean-space guarantees are tight and what fundamentally differentiates their behaviors. To clarify this, we develop a weighted Hilbert-space framework and obtain convergence rates that are strictly tighter than those from Euclidean analysis. In this geometry, the row-stochastic matrix becomes self-adjoint whereas the doubly stochastic one does not, creating additional penalty terms that amplify consensus error, thereby slowing convergence. Consequently, the difference in convergence arises not only from spectral gaps but also from these penalty terms. We then derive sufficient conditions under which the row-stochastic design converges faster even with a smaller spectral gap. Finally, by using a Rayleigh-quotient and Loewner-order eigenvalue comparison, we further obtain topology conditions that guarantee this advantage and yield practical topology-design guidelines.

Paper Structure

This paper contains 49 sections, 20 theorems, 200 equations, 7 figures, 1 table, 6 algorithms.

Key Result

Proposition 6.4

Under Assumptions ass:smootheness--ass:connected_graph, the following accumulated consensus error bounds hold (See proof in Appendix appendix: consensus error.): (Strategy I). If $\alpha < \frac{1}{2{\lambda_{\max}}\beta } \sqrt{\frac{1}{15B(\rho_J)}},$ then where $A(\rho_J):=\frac{1+\rho_J^2}{(1-\rho_J^2)^3}$, $B(\rho_J):=\frac{2(1+3\rho_J^4)}{(1-\rho_J^2)^3(1-\rho_J)}$, $c_\lambda=\frac{1}{n^2}

Figures (7)

  • Figure 1: Weighted gradient norms for least-squares experiment.
  • Figure 2: Interval losses for CIFAR-10 experiment under $\lambda_1$.
  • Figure 3: Interval losses for CIFAR-10 experiment under $\lambda_2$.
  • Figure 4: Adjacency matrix of $\mathcal{G}_{\lambda_A}$. If node $i$ is connected with node $j$, the $(i,j)$ block is blue.
  • Figure 5: Adjacency matrix of $\mathcal{G}_{\lambda_B}$. If node $i$ is connected with node $j$, the $(i,j)$ block is blue.
  • ...and 2 more figures

Theorems & Definitions (44)

  • Remark 4.1
  • Proposition 6.4
  • Corollary 6.5
  • Remark 6.6
  • Corollary 7.1
  • Remark 7.2
  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • ...and 34 more