Table of Contents
Fetching ...

Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning

Nathaniel Hudson, Valerie Hayot-Sasson, Yadu Babuji, Matt Baughman, J. Gregory Pauloski, Ryan Chard, Ian Foster, Kyle Chard

TL;DR

Flight addresses the scalability and topology limitations of traditional FL by delivering a flexible, open-source framework that natively supports hierarchical FL, asynchronous aggregation, and decoupled control and data planes. It introduces a modular architecture with tree-based network topologies, multiple launchers (local, Parsl, Globus Compute), and a ProxyStore-enabled data plane, enabling efficient wide-area deployments. Empirical results show Flight scales beyond Flower (up to 2048 workers), reduces communication costs by over 60% in HFL, and achieves notable reductions in makespan with asynchronous execution. The work demonstrates practical applicability for distributed edge/IoT settings and provides a foundation for future automated topology optimization and broader real-world deployments.

Abstract

Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed systems like the Internet-of-Things. We present Flight, a novel FL framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decouples the control plane from the data plane. We compare the performance of Flight against Flower, a state-of-the-art FL framework. Our results show that Flight scales beyond Flower, supporting up to 2048 simultaneous devices, and reduces FL makespan across several models. Finally, we show that Flight's hierarchical FL model can reduce communication overheads by more than 60%.

Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning

TL;DR

Flight addresses the scalability and topology limitations of traditional FL by delivering a flexible, open-source framework that natively supports hierarchical FL, asynchronous aggregation, and decoupled control and data planes. It introduces a modular architecture with tree-based network topologies, multiple launchers (local, Parsl, Globus Compute), and a ProxyStore-enabled data plane, enabling efficient wide-area deployments. Empirical results show Flight scales beyond Flower (up to 2048 workers), reduces communication costs by over 60% in HFL, and achieves notable reductions in makespan with asynchronous execution. The work demonstrates practical applicability for distributed edge/IoT settings and provides a foundation for future automated topology optimization and broader real-world deployments.

Abstract

Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed systems like the Internet-of-Things. We present Flight, a novel FL framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decouples the control plane from the data plane. We compare the performance of Flight against Flower, a state-of-the-art FL framework. Our results show that Flight scales beyond Flower, supporting up to 2048 simultaneous devices, and reduces FL makespan across several models. Finally, we show that Flight's hierarchical FL model can reduce communication overheads by more than 60%.
Paper Structure (24 sections, 4 equations, 9 figures, 2 tables)

This paper contains 24 sections, 4 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: High-level view of a standard two-tier FL system and process.
  • Figure 2: High-level view of a hierarchical FL system and process.
  • Figure 3: High-level view of Flight architecture. The Coordinator launches jobs to be run on Aggregators and Workers through the control plane, while data (e.g., model parameters, $\omega$) are transferred through a data plane. Each Worker trains its local copy of the model and sends back its locally-updated model to its parent (either the Coordinator or an Aggregator). Each Aggregator aggregates the responses of its children (Workers and other Aggregators alike). The Coordinator facilitates the entire process.
  • Figure 4: Example legal Flight network topologies: (a) simple two-tier network; (b) simple three-tier hierarchical network; (c) complex hierarchical network.
  • Figure 5: Weak scaling results: Runtime of Flower vs. Flight using Parsl and Parsl+RedisConnector for a series of increasingly complex models (see \ref{['table:models']}). Results confirm that our proposed Flight framework provides better performance and, in some cases, also scales to more workers.
  • ...and 4 more figures