Table of Contents
Fetching ...

Apodotiko: Enabling Efficient Serverless Federated Learning in Heterogeneous Environments

Mohak Chadha, Alexander Jensen, Jianfeng Gu, Osama Abboud, Michael Gerndt

TL;DR

Apodotiko tackles the challenge of efficient serverless Federated Learning in heterogeneous environments by introducing asynchronous, scoring-based client selection and fractional aggregation. The method computes a Client Efficiency Score integrating hardware capacity and local data size, and selects clients probabilistically while allowing late updates via staleness-weighted aggregation. Experimental results across MNIST, FEMNIST, Shakespeare, and Google Speech show substantial speedups (average 2.75x, max 7.03x) and a ~4x reduction in cold starts compared with baselines, especially in highly heterogeneous settings. These findings demonstrate practical benefits for deploying serverless FL in real-world, resource-diverse networks.

Abstract

Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients while keeping the data decentralized. Recent works on designing systems for efficient FL have shown that utilizing serverless computing technologies, particularly Function-as-a-Service (FaaS) for FL, can enhance resource efficiency, reduce training costs, and alleviate the complex infrastructure management burden on data holders. However, current serverless FL systems still suffer from the presence of stragglers, i.e., slow clients that impede the collaborative training process. While strategies aimed at mitigating stragglers in these systems have been proposed, they overlook the diverse hardware resource configurations among FL clients. To this end, we present Apodotiko, a novel asynchronous training strategy designed for serverless FL. Our strategy incorporates a scoring mechanism that evaluates each client's hardware capacity and dataset size to intelligently prioritize and select clients for each training round, thereby minimizing the effects of stragglers on system performance. We comprehensively evaluate Apodotiko across diverse datasets, considering a mix of CPU and GPU clients, and compare its performance against five other FL training strategies. Results from our experiments demonstrate that Apodotiko outperforms other FL training strategies, achieving an average speedup of 2.75x and a maximum speedup of 7.03x. Furthermore, our strategy significantly reduces cold starts by a factor of four on average, demonstrating suitability in serverless environments.

Apodotiko: Enabling Efficient Serverless Federated Learning in Heterogeneous Environments

TL;DR

Apodotiko tackles the challenge of efficient serverless Federated Learning in heterogeneous environments by introducing asynchronous, scoring-based client selection and fractional aggregation. The method computes a Client Efficiency Score integrating hardware capacity and local data size, and selects clients probabilistically while allowing late updates via staleness-weighted aggregation. Experimental results across MNIST, FEMNIST, Shakespeare, and Google Speech show substantial speedups (average 2.75x, max 7.03x) and a ~4x reduction in cold starts compared with baselines, especially in highly heterogeneous settings. These findings demonstrate practical benefits for deploying serverless FL in real-world, resource-diverse networks.

Abstract

Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients while keeping the data decentralized. Recent works on designing systems for efficient FL have shown that utilizing serverless computing technologies, particularly Function-as-a-Service (FaaS) for FL, can enhance resource efficiency, reduce training costs, and alleviate the complex infrastructure management burden on data holders. However, current serverless FL systems still suffer from the presence of stragglers, i.e., slow clients that impede the collaborative training process. While strategies aimed at mitigating stragglers in these systems have been proposed, they overlook the diverse hardware resource configurations among FL clients. To this end, we present Apodotiko, a novel asynchronous training strategy designed for serverless FL. Our strategy incorporates a scoring mechanism that evaluates each client's hardware capacity and dataset size to intelligently prioritize and select clients for each training round, thereby minimizing the effects of stragglers on system performance. We comprehensively evaluate Apodotiko across diverse datasets, considering a mix of CPU and GPU clients, and compare its performance against five other FL training strategies. Results from our experiments demonstrate that Apodotiko outperforms other FL training strategies, achieving an average speedup of 2.75x and a maximum speedup of 7.03x. Furthermore, our strategy significantly reduces cold starts by a factor of four on average, demonstrating suitability in serverless environments.
Paper Structure (22 sections, 2 equations, 7 figures, 3 tables, 3 algorithms)

This paper contains 22 sections, 2 equations, 7 figures, 3 tables, 3 algorithms.

Figures (7)

  • Figure 1: Comparing FedAvgmcmahan2017communication and FedLesScanelzohairy2022fedlesscan across various client-hardware resource configurations using FedLessfedless. The results are obtained using the non-IID data partitions of the Shakespeare dataset caldas2018leaf with 100 clients deployed on OpenFaaSopenfaas.
  • Figure 2: Comparing weighting functions for aggregating stale client model updates.
  • Figure 3: Client training durations for different hardware resource configurations with non-IID data partitions for the Shakespeare dataset caldas2018leaf.
  • Figure 4: Comparing different evaluation metrics across the different FL strategies.
  • Figure 5: Comparing Apodotiko with FedBuff.
  • ...and 2 more figures