Row-stochastic matrices can provably outperform doubly stochastic matrices in decentralized learning
Bing Liu, Boao Kong, Limin Lu, Kun Yuan, Chengcheng Zhao
TL;DR
The paper addresses decentralized learning with heterogeneous node weights by contrasting two designs: embedding weights into local losses (doubly stochastic mixing) versus embedding weights in a row-stochastic mixer. It introduces a weighted Hilbert space $L^2(\lambda;\mathbb{R}^d)$ in which the row-stochastic design exhibits self-adjointness and tighter consensus-error bounds than the non-self-adjoint doubly-stochastic case, yielding convergence guarantees that are strictly stronger than Euclidean analyses. The authors derive explicit spectral-gap–dependent conditions (via Rayleigh quotients and Loewner-order comparisons) under which the row-stochastic strategy converges faster, and they translate these into topology-design guidelines, notably proposing degree-proportional to weight allocations. Experiments on synthetic Least-Squares and CIFAR-10 with ResNet-18 corroborate the theoretical findings, showing that Strategy II consistently outperforms Strategy I across topologies. Overall, the work provides a principled framework and practical topology-design rules for efficient decentralized learning with heterogeneous node contributions.
Abstract
Decentralized learning often involves a weighted global loss with heterogeneous node weights $λ$. We revisit two natural strategies for incorporating these weights: (i) embedding them into the local losses to retain a uniform weight (and thus a doubly stochastic matrix), and (ii) keeping the original losses while employing a $λ$-induced row-stochastic matrix. Although prior work shows that both strategies yield the same expected descent direction for the global loss, it remains unclear whether the Euclidean-space guarantees are tight and what fundamentally differentiates their behaviors. To clarify this, we develop a weighted Hilbert-space framework $L^2(λ;\mathbb{R}^d)$ and obtain convergence rates that are strictly tighter than those from Euclidean analysis. In this geometry, the row-stochastic matrix becomes self-adjoint whereas the doubly stochastic one does not, creating additional penalty terms that amplify consensus error, thereby slowing convergence. Consequently, the difference in convergence arises not only from spectral gaps but also from these penalty terms. We then derive sufficient conditions under which the row-stochastic design converges faster even with a smaller spectral gap. Finally, by using a Rayleigh-quotient and Loewner-order eigenvalue comparison, we further obtain topology conditions that guarantee this advantage and yield practical topology-design guidelines.
