FedStale: leveraging stale client updates in federated learning
Angelo Rodio, Giovanni Neglia
TL;DR
FedStale addresses variance and bias in federated learning under heterogeneous client data and participation by introducing a tunable aggregation weight $\beta$ that blends fresh FedAvg updates with stale FedVARP updates. The authors provide a unified convergence analysis that encompasses FedAvg and FedVARP, derive practical guidelines for selecting $\beta$ based on data and participation heterogeneity, and validate the approach with extensive experiments on MNIST and CIFAR-10 demonstrating that FedStale often outperforms both baselines. They further show robustness when participation probabilities are online-estimated, highlighting the method's practical viability in real-world, uneven participation scenarios. Overall, FedStale offers a principled, memory-efficient, and adaptable framework to exploit stale updates to mitigate variance and objective inconsistency in heterogeneous federated settings.
Abstract
Federated learning algorithms, such as FedAvg, are negatively affected by data heterogeneity and partial client participation. To mitigate the latter problem, global variance reduction methods, like FedVARP, leverage stale model updates for non-participating clients. These methods are effective under homogeneous client participation. Yet, this paper shows that, when some clients participate much less than others, aggregating updates with different levels of staleness can detrimentally affect the training process. Motivated by this observation, we introduce FedStale, a novel algorithm that updates the global model in each round through a convex combination of "fresh" updates from participating clients and "stale" updates from non-participating ones. By adjusting the weight in the convex combination, FedStale interpolates between FedAvg, which only uses fresh updates, and FedVARP, which treats fresh and stale updates equally. Our analysis of FedStale convergence yields the following novel findings: i) it integrates and extends previous FedAvg and FedVARP analyses to heterogeneous client participation; ii) it underscores how the least participating client influences convergence error; iii) it provides practical guidelines to best exploit stale updates, showing that their usefulness diminishes as data heterogeneity decreases and participation heterogeneity increases. Extensive experiments featuring diverse levels of client data and participation heterogeneity not only confirm these findings but also show that FedStale outperforms both FedAvg and FedVARP in many settings.
