Mitigating Participation Imbalance Bias in Asynchronous Federated Learning
Xiangyu Chang, Manyi Yao, Srikanth V. Krishnamurthy, Christian R. Shelton, Anirban Chakraborty, Ananthram Swami, Samet Oymak, Amit Roy-Chowdhury
TL;DR
The paper addresses participation-imbalanced bias in asynchronous federated learning, showing that heterogeneity amplification arises when fast clients disproportionately influence updates due to staleness and partial participation. It introduces a unifying mean-squared-error framework that decomposes gradient error into sampling noise, bias, and delay terms, proving that bias vanishes only with full, all-client aggregation. Building on this, ACE implements immediate, non-buffered updates that average gradients from all $n$ clients, eliminating the bias term and achieving convergence rates robust to arbitrary data heterogeneity; ACED extends ACE with a delay-aware mechanism to handle extreme delays and dropouts. Theoretical results show ACE achieves a BDH-free rate with optimal learning rate scaling $\eta^* \propto \sqrt{\frac{n}{T}}$, and empirical results across vision and NLP tasks confirm faster convergence and better final accuracy under high heterogeneity and delay, with ACED offering practical trade-offs for real-world systems. Overall, the work provides a principled framework for understanding AFL dynamics and introduces scalable, robust all-client aggregation methods that improve convergence and stability in non-IID, delayed environments.
Abstract
In Asynchronous Federated Learning (AFL), the central server immediately updates the global model with each arriving client's contribution. As a result, clients perform their local training on different model versions, causing information staleness (delay). In federated environments with non-IID local data distributions, this asynchronous pattern amplifies the adverse effect of client heterogeneity (due to different data distribution, local objectives, etc.), as faster clients contribute more frequent updates, biasing the global model. We term this phenomenon heterogeneity amplification. Our work provides a theoretical analysis that maps AFL design choices to their resulting error sources when heterogeneity amplification occurs. Guided by our analysis, we propose ACE (All-Client Engagement AFL), which mitigates participation imbalance through immediate, non-buffered updates that use the latest information available from all clients. We also introduce a delay-aware variant, ACED, to balance client diversity against update staleness. Experiments on different models for different tasks across diverse heterogeneity and delay settings validate our analysis and demonstrate the robust performance of our approaches.
