Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning
Nathaniel Hudson, Valerie Hayot-Sasson, Yadu Babuji, Matt Baughman, J. Gregory Pauloski, Ryan Chard, Ian Foster, Kyle Chard
TL;DR
Flight addresses the scalability and topology limitations of traditional FL by delivering a flexible, open-source framework that natively supports hierarchical FL, asynchronous aggregation, and decoupled control and data planes. It introduces a modular architecture with tree-based network topologies, multiple launchers (local, Parsl, Globus Compute), and a ProxyStore-enabled data plane, enabling efficient wide-area deployments. Empirical results show Flight scales beyond Flower (up to 2048 workers), reduces communication costs by over 60% in HFL, and achieves notable reductions in makespan with asynchronous execution. The work demonstrates practical applicability for distributed edge/IoT settings and provides a foundation for future automated topology optimization and broader real-world deployments.
Abstract
Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed systems like the Internet-of-Things. We present Flight, a novel FL framework that supports complex hierarchical multi-tier topologies, asynchronous aggregation, and decouples the control plane from the data plane. We compare the performance of Flight against Flower, a state-of-the-art FL framework. Our results show that Flight scales beyond Flower, supporting up to 2048 simultaneous devices, and reduces FL makespan across several models. Finally, we show that Flight's hierarchical FL model can reduce communication overheads by more than 60%.
