FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting

Jeffrey Ma; Alan Tu; Yiling Chen; Vijay Janapa Reddi

FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting

Jeffrey Ma, Alan Tu, Yiling Chen, Vijay Janapa Reddi

TL;DR

The paper tackles fairness and convergence in cross-device asynchronous federated learning by addressing compute-heterogeneity bias. It introduces FedStaleWeight, a buffered AFL algorithm that reweights updates based on staleness through a welfare-maximization framework to ensure truthful reporting and fair participation. The authors provide a convergence guarantee for the method in smooth non-convex settings and demonstrate empirically that FedStaleWeight achieves faster convergence and higher final accuracy than the baseline FedBuff with gradient averaging, under non-IID data. An open-source test bench is released to facilitate exploration of buffered AFL strategies, enabling further research into fair and scalable asynchronous learning.

Abstract

Federated Learning (FL) endeavors to harness decentralized data while preserving privacy, facing challenges of performance, scalability, and collaboration. Asynchronous Federated Learning (AFL) methods have emerged as promising alternatives to their synchronous counterparts bounded by the slowest agent, yet they add additional challenges in convergence guarantees, fairness with respect to compute heterogeneity, and incorporation of staleness in aggregated updates. Specifically, AFL biases model training heavily towards agents who can produce updates faster, leaving slower agents behind, who often also have differently distributed data which is not learned by the global model. Naively upweighting introduces incentive issues, where true fast updating agents may falsely report updates at a slower speed to increase their contribution to model training. We introduce FedStaleWeight, an algorithm addressing fairness in aggregating asynchronous client updates by employing average staleness to compute fair re-weightings. FedStaleWeight reframes asynchronous federated learning aggregation as a mechanism design problem, devising a weighting strategy that incentivizes truthful compute speed reporting without favoring faster update-producing agents by upweighting agent updates based on staleness. Leveraging only observed agent update staleness, FedStaleWeight results in more equitable aggregation on a per-agent basis. We both provide theoretical convergence guarantees in the smooth, non-convex setting and empirically compare FedStaleWeight against the commonly used asynchronous FedBuff with gradient averaging, demonstrating how it achieves stronger fairness, expediting convergence to a higher global model accuracy. Finally, we provide an open-source test bench to facilitate exploration of buffered AFL aggregation strategies, fostering further research in asynchronous federated learning paradigms.

FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting

TL;DR

Abstract

Paper Structure (20 sections, 3 theorems, 39 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 20 sections, 3 theorems, 39 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Motivation and Background
Our Contribution
Related Work
Preliminaries
Asynchronous FL Setting
Defining Fairness in Update Aggregation
Step Size as Uncertainty in Approximation
Algorithm
Solving the Welfare Maximization Problem
Deriving Expected Influence From Staleness
Extending To Online Learning
Algorithm
Theoretical Results
Empirical Results
...and 5 more sections

Key Result

Theorem 1

Let $\eta_{\ell}^{(q)}$ be the local learning rate of client SGD in the q-th local step, and let $\alpha(Q) = \sum_{q=0}^{Q-1}{\eta_\ell^{(q)}}$, $\beta(Q) = \sum_{q=0}^{Q-1}{(\eta_\ell^{(q)})^2}$, and $U(\tau_{\max, b})$ be the maximum squared deviation between equal weighting ($1/b$) and a normali

Figures (3)

Figure 1: Aggregation in buffered asynchronous federated learning. Updates accumulate in the buffer over time: for a buffer size of $b=4$, the diagram shows aggregation of updates and how model versions are incremented for each arriving update.
Figure 2: Example test accuracy curves for FedStaleWeight versus buffered FedAvg. FedStaleWeight more quickly converges to higher accuracy under the same non-IID setting, indicating higher fairness aggregating agent updates.
Figure : FedStaleWeight-server

Theorems & Definitions (3)

Theorem 1
Theorem 2
Lemma 1

FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting

TL;DR

Abstract

FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)