Table of Contents
Fetching ...

Mitigating Participation Imbalance Bias in Asynchronous Federated Learning

Xiangyu Chang, Manyi Yao, Srikanth V. Krishnamurthy, Christian R. Shelton, Anirban Chakraborty, Ananthram Swami, Samet Oymak, Amit Roy-Chowdhury

TL;DR

The paper addresses participation-imbalanced bias in asynchronous federated learning, showing that heterogeneity amplification arises when fast clients disproportionately influence updates due to staleness and partial participation. It introduces a unifying mean-squared-error framework that decomposes gradient error into sampling noise, bias, and delay terms, proving that bias vanishes only with full, all-client aggregation. Building on this, ACE implements immediate, non-buffered updates that average gradients from all $n$ clients, eliminating the bias term and achieving convergence rates robust to arbitrary data heterogeneity; ACED extends ACE with a delay-aware mechanism to handle extreme delays and dropouts. Theoretical results show ACE achieves a BDH-free rate with optimal learning rate scaling $\eta^* \propto \sqrt{\frac{n}{T}}$, and empirical results across vision and NLP tasks confirm faster convergence and better final accuracy under high heterogeneity and delay, with ACED offering practical trade-offs for real-world systems. Overall, the work provides a principled framework for understanding AFL dynamics and introduces scalable, robust all-client aggregation methods that improve convergence and stability in non-IID, delayed environments.

Abstract

In Asynchronous Federated Learning (AFL), the central server immediately updates the global model with each arriving client's contribution. As a result, clients perform their local training on different model versions, causing information staleness (delay). In federated environments with non-IID local data distributions, this asynchronous pattern amplifies the adverse effect of client heterogeneity (due to different data distribution, local objectives, etc.), as faster clients contribute more frequent updates, biasing the global model. We term this phenomenon heterogeneity amplification. Our work provides a theoretical analysis that maps AFL design choices to their resulting error sources when heterogeneity amplification occurs. Guided by our analysis, we propose ACE (All-Client Engagement AFL), which mitigates participation imbalance through immediate, non-buffered updates that use the latest information available from all clients. We also introduce a delay-aware variant, ACED, to balance client diversity against update staleness. Experiments on different models for different tasks across diverse heterogeneity and delay settings validate our analysis and demonstrate the robust performance of our approaches.

Mitigating Participation Imbalance Bias in Asynchronous Federated Learning

TL;DR

The paper addresses participation-imbalanced bias in asynchronous federated learning, showing that heterogeneity amplification arises when fast clients disproportionately influence updates due to staleness and partial participation. It introduces a unifying mean-squared-error framework that decomposes gradient error into sampling noise, bias, and delay terms, proving that bias vanishes only with full, all-client aggregation. Building on this, ACE implements immediate, non-buffered updates that average gradients from all clients, eliminating the bias term and achieving convergence rates robust to arbitrary data heterogeneity; ACED extends ACE with a delay-aware mechanism to handle extreme delays and dropouts. Theoretical results show ACE achieves a BDH-free rate with optimal learning rate scaling , and empirical results across vision and NLP tasks confirm faster convergence and better final accuracy under high heterogeneity and delay, with ACED offering practical trade-offs for real-world systems. Overall, the work provides a principled framework for understanding AFL dynamics and introduces scalable, robust all-client aggregation methods that improve convergence and stability in non-IID, delayed environments.

Abstract

In Asynchronous Federated Learning (AFL), the central server immediately updates the global model with each arriving client's contribution. As a result, clients perform their local training on different model versions, causing information staleness (delay). In federated environments with non-IID local data distributions, this asynchronous pattern amplifies the adverse effect of client heterogeneity (due to different data distribution, local objectives, etc.), as faster clients contribute more frequent updates, biasing the global model. We term this phenomenon heterogeneity amplification. Our work provides a theoretical analysis that maps AFL design choices to their resulting error sources when heterogeneity amplification occurs. Guided by our analysis, we propose ACE (All-Client Engagement AFL), which mitigates participation imbalance through immediate, non-buffered updates that use the latest information available from all clients. We also introduce a delay-aware variant, ACED, to balance client diversity against update staleness. Experiments on different models for different tasks across diverse heterogeneity and delay settings validate our analysis and demonstrate the robust performance of our approaches.

Paper Structure

This paper contains 56 sections, 14 theorems, 126 equations, 6 figures, 7 tables, 6 algorithms.

Key Result

Theorem 1

Suppose Assumptions A1-A5 hold. By choosing an appropriate global step size $\eta$ proportional to $\sqrt{n/T}$, ACE (Algorithm 1) achieves the following convergence rate for smooth non-convex objectives: where $\Delta = F(w^0) - F(w^T)$. (Proof can be found in Appendix supp_sec:rate).

Figures (6)

  • Figure 1: Staleness and Heterogeneity Amplification in AFL. Left: Clients compute at varying speeds (arrow lengths) on their local datasets with heterogeneous data distributions ($\mathbb{P}_i$, colors). Color intensity reflects staleness---the degree to which a client's model version is outdated due to infrequent client-server communication. Right: Update sequences (during $t_0$ to $t_1$): 'Immediate Update' applies client updates on arrival; 'Buffered Update' waits and aggregates multiple clients' updates before applying. However, both strategies demonstrate heterogeneity amplification: faster clients (e.g., Client 3) contribute more frequently, resulting in their imbalanced influence. In contrast, the 'All-Client Update' strategy aims to balance updates (despite staleness) from all the clients and thereby mitigate heterogeneity amplification.
  • Figure 2: Impact of data heterogeneity (Dirichlet $\alpha$) and client delay (Exponential mean $\beta$) on CIFAR-10 test accuracy over 500 server iterations. (a) $\alpha=0.1$, low delay ($\beta = 5$). (b) $\alpha=0.3$, low delay. (c) $\alpha=0.1$, increased delay ($\beta = 30$). (d) $\alpha=0.3$, increased delay. ACE demonstrates robust performance toward various heterogeneity and delay. Extended results are in Appendix \ref{['supp_sec:main_exp_supp']}.
  • Figure 3: Final test accuracy ($T=500$, Dir($\alpha=0.3$), $\beta=5$) vs. client dropout. (a) ACED ($\tau_{\text{algo}}=10$) shows superior dropout robustness compared to Conceptual ACE, CA$^2$FL, and Vanilla ASGD. (b) Ablation on ACED's $\tau_{\text{algo}}$: performance suffers if $\tau_{\text{algo}}$ is too small (partial participation bias) or too large (staleness error), but is stable across moderate $\tau_{\text{algo}}$ values.
  • Figure a.1: Extended performance comparison of AFL algorithms on CIFAR-10 up to 1000 server iterations, including stability analysis via error bars. The four subplots correspond to the scenarios detailed in Section \ref{['sec:exp']}: (a) Dir (0.1), (b) Dir (0.3), (c) Dir (0.1) with increased delay, and (d) Dir (0.3) with increased delay. Shaded regions represent the standard deviation ($\pm\sigma$) of accuracy. The error bands clearly show that single-client update methods (Vanilla ASGD, Delay-Adaptive ASGD) exhibit higher variance, while multi-client aggregation methods (FedBuff, CA$^2$FL, and ACE) converge more stably.
  • Figure a.2: Comparative Performance of Asynchronous Federated Learning Algorithms on CIFAR-100 under Varying Data Heterogeneity and System Delays. The heatmaps illustrate the final test accuracy of six AFL algorithms: (a) ACE, (b) ACED ($\tau_{\text{algo}}=50$), (c) CA$^2$FL, (d) FedBuff, (e) Delay-Adaptive ASGD, and (f) Vanilla ASGD. The x-axis represents the Dirichlet distribution parameter $\alpha$ controlling client data non-IIDness (lower $\alpha$ indicates higher heterogeneity). The y-axis represents the mean $\beta$ of an exponential distribution modeling client delays (higher $\beta$ indicates greater system delay and straggler presence). Accuracy values are normalized across all heatmaps using a common color scale to facilitate direct comparison. Algorithms like ACE and ACED demonstrate strong performance and robustness, particularly maintaining higher accuracies under combined high heterogeneity and high delay conditions. In contrast, algorithms such as FedBuff, Delay-Adaptive ASGD, and Vanilla ASGD show a more pronounced degradation, illustrating the impact of heterogeneity amplification. ACED's performance at high delay (e.g., $\beta=30$) relative to ACE highlights its design for mitigating the impact of extreme stragglers.
  • ...and 1 more figures

Theorems & Definitions (27)

  • Theorem 1: Convergence Rate of ACE (Alg. \ref{['alg:conceptual_only']})
  • Lemma a.1
  • proof
  • Lemma a.2
  • proof
  • Lemma a.3
  • proof
  • Lemma a.4: Descent Lemma
  • proof
  • Lemma a.5: reddiadaptive, Model Drift from Local Steps
  • ...and 17 more