Table of Contents
Fetching ...

FLStore: Efficient Federated Learning Storage for non-training workloads

Ahmad Faraz Khan, Samuel Fountain, Ahmed M. Abdelmoniem, Ali R. Butt, Ali Anwar

TL;DR

FLStore tackles the high latency and cost of non-training workloads in federated learning by unifying the data and compute planes on a serverless cache and applying taxonomy-driven, locality-aware caching policies. It introduces a three-component architecture (Request Tracker, Cache Engine, Serverless Cache) that tracks data across disaggregated functions, routes requests to data-bearing functions, and persistently stores cold data to ensure fault tolerance. Empirical results show substantial improvements over cloud object stores and in-memory caches, with per-request latency reductions up to $99.94\%$ and cost savings approaching $99\%$, while maintaining scalability and fault tolerance. The work demonstrates that tailoring caching policies to FL’s iterative data access patterns and co-locating compute with data can dramatically improve the efficiency of non-training workloads and enable easier integration with existing FL frameworks.

Abstract

Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. With an aggregator server coordinating training, aggregating model updates, and storing metadata across rounds. In addition to training, a substantial part of FL systems are the non-training workloads such as scheduling, personalization, clustering, debugging, and incentivization. Most existing systems rely on the aggregator to handle non-training workloads and use cloud services for data storage. This results in high latency and increased costs as non-training workloads rely on large volumes of metadata, including weight parameters from client updates, hyperparameters, and aggregated updates across rounds, making the situation even worse. We propose FLStore, a serverless framework for efficient FL non-training workloads and storage. FLStore unifies the data and compute planes on a serverless cache, enabling locality-aware execution via tailored caching policies to reduce latency and costs. Per our evaluations, compared to cloud object store based aggregator server FLStore reduces per request average latency by 71% and costs by 92.45%, with peak improvements of 99.7% and 98.8%, respectively. Compared to an in-memory cloud cache based aggregator server, FLStore reduces average latency by 64.6% and costs by 98.83%, with peak improvements of 98.8% and 99.6%, respectively. FLStore integrates seamlessly with existing FL frameworks with minimal modifications, while also being fault-tolerant and highly scalable.

FLStore: Efficient Federated Learning Storage for non-training workloads

TL;DR

FLStore tackles the high latency and cost of non-training workloads in federated learning by unifying the data and compute planes on a serverless cache and applying taxonomy-driven, locality-aware caching policies. It introduces a three-component architecture (Request Tracker, Cache Engine, Serverless Cache) that tracks data across disaggregated functions, routes requests to data-bearing functions, and persistently stores cold data to ensure fault tolerance. Empirical results show substantial improvements over cloud object stores and in-memory caches, with per-request latency reductions up to and cost savings approaching , while maintaining scalability and fault tolerance. The work demonstrates that tailoring caching policies to FL’s iterative data access patterns and co-locating compute with data can dramatically improve the efficiency of non-training workloads and enable easier integration with existing FL frameworks.

Abstract

Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. With an aggregator server coordinating training, aggregating model updates, and storing metadata across rounds. In addition to training, a substantial part of FL systems are the non-training workloads such as scheduling, personalization, clustering, debugging, and incentivization. Most existing systems rely on the aggregator to handle non-training workloads and use cloud services for data storage. This results in high latency and increased costs as non-training workloads rely on large volumes of metadata, including weight parameters from client updates, hyperparameters, and aggregated updates across rounds, making the situation even worse. We propose FLStore, a serverless framework for efficient FL non-training workloads and storage. FLStore unifies the data and compute planes on a serverless cache, enabling locality-aware execution via tailored caching policies to reduce latency and costs. Per our evaluations, compared to cloud object store based aggregator server FLStore reduces per request average latency by 71% and costs by 92.45%, with peak improvements of 99.7% and 98.8%, respectively. Compared to an in-memory cloud cache based aggregator server, FLStore reduces average latency by 64.6% and costs by 98.83%, with peak improvements of 98.8% and 99.6%, respectively. FLStore integrates seamlessly with existing FL frameworks with minimal modifications, while also being fault-tolerant and highly scalable.

Paper Structure

This paper contains 49 sections, 2 equations, 19 figures, 2 tables.

Figures (19)

  • Figure 1: Non-training portion of latency in total FL process per round with 200 clients, EfficientNet model tan2021efficientnetv2, 1000 training rounds, and CIFAR10 Dataset krizhevsky2009learning.
  • Figure 2: Non-training portion of cost in total FL process per round with 200 clients, EfficientNet model tan2021efficientnetv2, 1000 training rounds, and CIFAR10 Dataset krizhevsky2009learning.
  • Figure 3: Data flow of serving non-training requests in conventional FL aggregators
  • Figure 4: Average workload latencies computation and communication of non-training FL workloads.
  • Figure 5: FLStore architecture design.
  • ...and 14 more figures