Communication-Efficient Federated Optimization over Semi-Decentralized Networks
He Wang, Yuejie Chi
TL;DR
This work tackles the communication bottleneck in large-scale federated and decentralized learning by proposing PISCO, a gradient-tracking–based algorithm designed for semi-decentralized networks where server access occurs with probability $p$. PISCO integrates multiple local updates with a probabilistic mix of agent-to-server and agent-to-agent communications, enabling a linear speedup in the number of agents $n$ and local updates $T_o$. The authors prove convergence to a stationary point with rate $O\left(\dfrac{1}{\sqrt{nT_oK}}\right)$ for mini-batch gradients and $O\left(\dfrac{1}{nK}\right)$ for full-batch gradients, while reducing network dependency to $O\left(\lambda_p^{-2}\right)$ under appropriate $p$ and connectivity, without requiring bounded data dissimilarity. Empirical results on logistic regression with nonconvex regularization and neural networks confirm improved communication efficiency, robustness to heterogeneity, and resilience across network topologies, highlighting PISCO’s practical impact for scalable distributed learning.
Abstract
In large-scale federated and decentralized learning, communication efficiency is one of the most challenging bottlenecks. While gossip communication -- where agents can exchange information with their connected neighbors -- is more cost-effective than communicating with the remote server, it often requires a greater number of communication rounds, especially for large and sparse networks. To tackle the trade-off, we examine the communication efficiency under a semi-decentralized communication protocol, in which agents can perform both agent-to-agent and agent-to-server communication in a probabilistic manner. We design a tailored communication-efficient algorithm over semi-decentralized networks, referred to as PISCO, which inherits the robustness to data heterogeneity thanks to gradient tracking and allows multiple local updates for saving communication. We establish the convergence rate of PISCO for nonconvex problems and show that PISCO enjoys a linear speedup in terms of the number of agents and local updates. Our numerical results highlight the superior communication efficiency of PISCO and its resilience to data heterogeneity and various network topologies.
