Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees
Shahryar Zehtabi, Dong-Jun Han, Rohit Parasnis, Seyyedali Hosseinalipour, Christopher G. Brinton
TL;DR
This work introduces Decentralized Sporadic Federated Learning (DSpodFL), a unified framework that models sporadic local updates and sporadic inter-client communications via binary indicator variables on a time-varying graph. It provides convergence guarantees for both convex and non-convex losses, accommodating heterogeneous and time-varying resources and gradient noise, and shows that existing decentralized methods are special cases of DSpodFL. Theoretical results establish conditions under which the algorithm converges and quantify the impact of sporadicity on convergence rate and consensus. Empirical results on FMNIST and CIFAR-10 demonstrate that DSpodFL achieves faster training with lower delay than baseline decentralized methods, particularly under high heterogeneity and dynamics, underscoring its practical value in resource-constrained, serverless settings.
Abstract
Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning ($\texttt{DSpodFL}$), a DFL methodology built on a generalized notion of $\textit{sporadicity}$ in both local gradient and aggregation processes. $\texttt{DSpodFL}$ subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing $\textit{heterogeneous and time-varying}$ computation/communication scenarios. We analytically characterize the convergence behavior of $\texttt{DSpodFL}$ for both convex and non-convex models and for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises. We show how our bounds recover existing results from decentralized gradient descent as special cases. Experiments demonstrate that $\texttt{DSpodFL}$ consistently achieves improved training speeds compared with baselines under various system settings.
