CAFe: Cost and Age aware Federated Learning
Sahan Liyanaarachchi, Kanchana Thilakarathna, Sennur Ulukus
TL;DR
The paper tackles resource wastage and communication costs in federated learning with heterogeneous client resources by introducing the Age of Clients (AoC) as a convergence-relevant metric. It analyzes a minimal-learner MCU scheme with a reporting deadline, derives closed-form expressions for resource wastage, communication cost, and AoC, and proves that setting $M=1$ often optimizes these metrics while also linking AoC to convergence bounds. To address heterogeneity, two schemes—Age Weighted Update (AWU) and Aggregated Gradient Update (AGU)—are proposed, and their effectiveness is demonstrated via MNIST experiments under IID and non-IID data distributions. The work provides a principled framework for selecting $M$ and $T$ to balance efficiency and convergence, and offers practical enhancements for robustness against biased or adversarial clients in heterogeneous FL environments.
Abstract
In many federated learning (FL) models, a common strategy employed to ensure the progress in the training process, is to wait for at least $M$ clients out of the total $N$ clients to send back their local gradients based on a reporting deadline $T$, once the parameter server (PS) has broadcasted the global model. If enough clients do not report back within the deadline, the particular round is considered to be a failed round and the training round is restarted from scratch. If enough clients have responded back, the round is deemed successful and the local gradients of all the clients that responded back are used to update the global model. In either case, the clients that failed to report back an update within the deadline would have wasted their computational resources. Having a tighter deadline (small $T$) and waiting for a larger number of participating clients (large $M$) leads to a large number of failed rounds and therefore greater communication cost and computation resource wastage. However, having a larger $T$ leads to longer round durations whereas smaller $M$ may lead to noisy gradients. Therefore, there is a need to optimize the parameters $M$ and $T$ such that communication cost and the resource wastage is minimized while having an acceptable convergence rate. In this regard, we show that the average age of a client at the PS appears explicitly in the theoretical convergence bound, and therefore, can be used as a metric to quantify the convergence of the global model. We provide an analytical scheme to select the parameters $M$ and $T$ in this setting.
