Falafels: A tool for Estimating Federated Learning Energy Consumption via Discrete Simulation
Andrew Mary Huet de Barochez, Stéphan Plassart, Sébastien Monnet
TL;DR
This paper addresses the challenge of estimating energy consumption and training time in Federated Learning by introducing Falafels, a discrete-simulation tool built on Simgrid. Falafels models computation and communication costs with a fast, energy-aware simulator, providing nearly instant feedback to guide design choices. Its contributions include extensible FL modelling with multiple topologies and FSM-based learning algorithms, integration of a Simgrid-based energy model, and an evolutionary optimization workflow. The results demonstrate fast exploration of configurations and energy reductions under certain heterogeneous and asynchronous setups, and the authors position Falafels as a complementary tool to existing measurement and experimental frameworks.
Abstract
The growth in computational power and data hungriness of Machine Learning has led to an important shift of research efforts towards the distribution of ML models on multiple machines, leading in even more powerful models. However, there exists many Distributed Artificial Intelligence paradigms and for each of them the platform and algorithm configurations play an important role in terms of training time and energy consumption. Many mathematical models and frameworks can respectively predict and benchmark this energy consumption, nonetheless, the former lacks of realism and extensibility while the latter suffers high run-times and actual power consumption. In this article, we introduce Falafels, an extensible tool that predicts the energy consumption and training time of -but not limited to -Federated Learning systems. It distinguishes itself with its discrete-simulatorbased solution leading to nearly instant run-time and fast development of new algorithms. Furthermore, we show this approach permits the use of an evolutionary algorithm providing the ability to optimize the system configuration for a given machine learning workload.
