FedStale: leveraging stale client updates in federated learning

Angelo Rodio; Giovanni Neglia

FedStale: leveraging stale client updates in federated learning

Angelo Rodio, Giovanni Neglia

TL;DR

FedStale addresses variance and bias in federated learning under heterogeneous client data and participation by introducing a tunable aggregation weight $\beta$ that blends fresh FedAvg updates with stale FedVARP updates. The authors provide a unified convergence analysis that encompasses FedAvg and FedVARP, derive practical guidelines for selecting $\beta$ based on data and participation heterogeneity, and validate the approach with extensive experiments on MNIST and CIFAR-10 demonstrating that FedStale often outperforms both baselines. They further show robustness when participation probabilities are online-estimated, highlighting the method's practical viability in real-world, uneven participation scenarios. Overall, FedStale offers a principled, memory-efficient, and adaptable framework to exploit stale updates to mitigate variance and objective inconsistency in heterogeneous federated settings.

Abstract

Federated learning algorithms, such as FedAvg, are negatively affected by data heterogeneity and partial client participation. To mitigate the latter problem, global variance reduction methods, like FedVARP, leverage stale model updates for non-participating clients. These methods are effective under homogeneous client participation. Yet, this paper shows that, when some clients participate much less than others, aggregating updates with different levels of staleness can detrimentally affect the training process. Motivated by this observation, we introduce FedStale, a novel algorithm that updates the global model in each round through a convex combination of "fresh" updates from participating clients and "stale" updates from non-participating ones. By adjusting the weight in the convex combination, FedStale interpolates between FedAvg, which only uses fresh updates, and FedVARP, which treats fresh and stale updates equally. Our analysis of FedStale convergence yields the following novel findings: i) it integrates and extends previous FedAvg and FedVARP analyses to heterogeneous client participation; ii) it underscores how the least participating client influences convergence error; iii) it provides practical guidelines to best exploit stale updates, showing that their usefulness diminishes as data heterogeneity decreases and participation heterogeneity increases. Extensive experiments featuring diverse levels of client data and participation heterogeneity not only confirm these findings but also show that FedStale outperforms both FedAvg and FedVARP in many settings.

FedStale: leveraging stale client updates in federated learning

TL;DR

FedStale addresses variance and bias in federated learning under heterogeneous client data and participation by introducing a tunable aggregation weight

that blends fresh FedAvg updates with stale FedVARP updates. The authors provide a unified convergence analysis that encompasses FedAvg and FedVARP, derive practical guidelines for selecting

based on data and participation heterogeneity, and validate the approach with extensive experiments on MNIST and CIFAR-10 demonstrating that FedStale often outperforms both baselines. They further show robustness when participation probabilities are online-estimated, highlighting the method's practical viability in real-world, uneven participation scenarios. Overall, FedStale offers a principled, memory-efficient, and adaptable framework to exploit stale updates to mitigate variance and objective inconsistency in heterogeneous federated settings.

Abstract

Paper Structure (26 sections, 20 theorems, 115 equations, 5 figures, 3 algorithms)

This paper contains 26 sections, 20 theorems, 115 equations, 5 figures, 3 algorithms.

Introduction
Problem Definition and Background
The FedStale Algorithm
A motivating example
A convex combination of fresh and stale updates
Comparison to related work
Convergence Analysis
Finding the optimal weight $\beta^*$
Experimental Results
Experimental setup
Existence of different regimes
Online estimation of participation probabilities
Conclusion
Appendix
Proof sketch, Theorem \ref{['thm:fedstale_upper']}
...and 11 more sections

Key Result

Theorem 1

Under Assumptions asm:smoothness--asm:participation, if the client and server learning rates, $\eta_c$ and $\eta_s$, are chosen such that $\eta \leq \frac{1}{8LK}$ and $\eta \leq \min \left\{ \frac{N p_{\text{var}}}{12(1-\beta)^2}, \frac{p_{\text{var}}p_{\text{min}}}{3\beta^2 p_{\text{avg}}} \right\ where $F^* \triangleq \min_{\bm{w}} F(\bm{w})$, $H^{(1)} \triangleq \frac{1}{N}\sum_{i=1}^{N} ||

Figures (5)

Figure 1: Comparison of FedAvg, FedVARP, and FedStale in a two-clients, 2D quadratic setting with heterogeneous client participation. Fig. \ref{['fig:1']}: Contour plots of client objectives, their local optima, and global optimum. Client participation ratio is $p_1/p_2=100$. Fig. \ref{['fig:2']}: Trajectories by FedAvg and FedVARP over T=4000 rounds with K=5 local iterations each. While both algorithms target the global optimum, FedAvg struggles with large variance and FedVARP follows suboptimal paths due to stale updates. Fig. \ref{['fig:3']}: FedStale ($\beta$=0.8) follow a more stable trajectory under heterogeneous client participation. Fig. \ref{['fig:4']}: Learning curves of FedAvg, FedVARP, and FedStale over 10 runs. With a lower weight on stale updates ($\beta$=0.8), FedStale achieves faster convergence to the global optimum.
Figure 2: $\beta_{\text{opt}}$ values for FedAvg ($\beta$=0), FedVARP ($\beta$=1), and FedStale ($\beta$$\in${0.2, 0.5, 0.8}) across 48 heterogeneity settings on the MNIST dataset. Color gradients range from lighter shades ($\beta_{\text{opt}}$=0) to darker shades ($\beta_{\text{opt}}$=1).
Figure 3: Test accuracy of FedAvg ($\beta$=0), FedVARP ($\beta$=1), and FedStale ($\beta$=0.5) varying data heterogeneity at fixed participation ratio $p_{\text{avg}}/p_{\text{min}}=10$.
Figure 4: Test accuracy of FedAvg ($\beta$=0), FedVARP ($\beta$=1), and FedStale ($\beta$=0.5) varying client participation ratio at fixed data heterogeneity $\hat{\sigma}_g^2=0.6$.
Figure 5: "Exact" vs. "Estimated" participation probabilities, $\hat{\sigma}_g^2=0.6$.

Theorems & Definitions (40)

Theorem 1: Convergence of FedStale, upper bound
Corollary 2: Convergence of FedAvg, upper bound
Corollary 3: Convergence of FedVARP, upper bound
Theorem 4: Convergence of FedStale, lower bound
Remark 1
proof
Remark 2
Lemma 1: Descent lemma
proof : Proof of Lemma \ref{['lem:descent_nc']}
Lemma 2: Expected value of the local stochastic pseudo-gradients
...and 30 more

FedStale: leveraging stale client updates in federated learning

TL;DR

Abstract

FedStale: leveraging stale client updates in federated learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (40)