Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Shahryar Zehtabi; Dong-Jun Han; Rohit Parasnis; Seyyedali Hosseinalipour; Christopher G. Brinton

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Shahryar Zehtabi, Dong-Jun Han, Rohit Parasnis, Seyyedali Hosseinalipour, Christopher G. Brinton

TL;DR

This work introduces Decentralized Sporadic Federated Learning (DSpodFL), a unified framework that models sporadic local updates and sporadic inter-client communications via binary indicator variables on a time-varying graph. It provides convergence guarantees for both convex and non-convex losses, accommodating heterogeneous and time-varying resources and gradient noise, and shows that existing decentralized methods are special cases of DSpodFL. Theoretical results establish conditions under which the algorithm converges and quantify the impact of sporadicity on convergence rate and consensus. Empirical results on FMNIST and CIFAR-10 demonstrate that DSpodFL achieves faster training with lower delay than baseline decentralized methods, particularly under high heterogeneity and dynamics, underscoring its practical value in resource-constrained, serverless settings.

Abstract

Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning ($\texttt{DSpodFL}$), a DFL methodology built on a generalized notion of $\textit{sporadicity}$ in both local gradient and aggregation processes. $\texttt{DSpodFL}$ subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing $\textit{heterogeneous and time-varying}$ computation/communication scenarios. We analytically characterize the convergence behavior of $\texttt{DSpodFL}$ for both convex and non-convex models and for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises. We show how our bounds recover existing results from decentralized gradient descent as special cases. Experiments demonstrate that $\texttt{DSpodFL}$ consistently achieves improved training speeds compared with baselines under various system settings.

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

TL;DR

Abstract

), a DFL methodology built on a generalized notion of

in both local gradient and aggregation processes.

subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing

computation/communication scenarios. We analytically characterize the convergence behavior of

for both convex and non-convex models and for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises. We show how our bounds recover existing results from decentralized gradient descent as special cases. Experiments demonstrate that

consistently achieves improved training speeds compared with baselines under various system settings.

Paper Structure (68 sections, 31 theorems, 195 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 68 sections, 31 theorems, 195 equations, 11 figures, 5 tables, 1 algorithm.

Introduction
Related Works
Decentralized Sporadic Federated Learning
DSpodFL: Decentralized FL with Sporadicity
Key Takeaways from DSpodFL
Matrix Form of Updates in DSpodFL
Convergence Analysis
Definitions and Assumptions
Average Model Error and Consensus Error
Sufficient Condition for Convergence
Main Theorem and Discussions for Convex Case
Analysis for Non-Convex Case
Numerical Evaluation
Conclusion and Limitations
Notation
...and 53 more sections

Key Result

Lemma 4.7

(See Appendix appendix:lemma:optimization for the proof.) Let Assumptions assump:smooth_convex_div and assump:graderror hold. For each iteration $k \geq 0$, we have the following bound on the expected average model error: $\mathbb{E}_{\mathbf{\Xi}^{(k)}} [ \| \bar{\mathbf{\theta}}^{(k+1)} - \mathbf{ , and .

Figures (11)

Figure 1: Illustrations of centralized FL (Fig. \ref{['subfig:fl']}) and different consensus-based decentralized optimization algorithms (Figs. \ref{['subfig:dgd']}-\ref{['subfig:dspodfl']}). In decentralized gradient descent (DGD, Fig. \ref{['subfig:dgd']}), local updates and inter-client communications occur at every iteration of training. Fig. \ref{['subfig:dfedavg']} depicts decentralized local SGD, or DFedAvg, where communications occur only every $D$-th iteration. Communication and computation operations are carried out in a deterministic pattern (solid lines, thickness representing relative frequency) in Figs. \ref{['subfig:dgd']} and \ref{['subfig:dfedavg']}. Randomized gossip (RG, Fig. \ref{['subfig:rg']}) adopts sporadic communications for aggregations. DSpodFL in Fig. \ref{['subfig:dspodfl']} considers sporadicity in both communications and computations (dashed lines), where the number of local SGDs and the period of model aggregations are heterogeneous across clients and vary over time.
Figure 2: Accuracy vs. latency plots. DSpodFL achieves the target accuracy much faster with less delay, emphasizing the benefit of sporadicity in DFL for SGD iterations and model aggregations simultaneously.
Figure 3: Effects of system parameters on FMNIST. In Figs. \ref{['fig:results_svm_data_dist']}, \ref{['fig:results_svm_graph_conn']} and \ref{['fig:results_svm_num_clients']}, client and link capabilities $d_i$ and $b_{ij}$ are sampled from a uniform distribution $\mathcal{U}(0,1]$. The overall results confirm the advantage of DSpodFL.
Figure 4: Scalability to larger clients and robustness against distributions on FMNIST.
Figure 5: Accuracy vs. latency plots obtained in different setups where the SGD and aggregation probabilities are sampled from the uniform distribution $\mathcal{U}(0,1]$. DSpodFL achieves the target accuracy much faster with less delay, emphasizing the benefit of sporadicity in DFL for SGD iterations and model aggregations simultaneously.
...and 6 more figures

Theorems & Definitions (34)

Definition 4.5
Definition 4.6: Indicator variables
Lemma 4.7: Average model error
Lemma 4.8: Consensus error
Definition 4.9: Error vector
Proposition 4.10: Spectral radius
Theorem 4.11: Strongly-convex convergence result
Theorem 4.12: Non-convex convergence result
Lemma D.1: Gradient bounds
Lemma D.2: Expected value of SGD noise average and deviation
...and 24 more

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

TL;DR

Abstract

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (34)