Table of Contents
Fetching ...

Falafels: A tool for Estimating Federated Learning Energy Consumption via Discrete Simulation

Andrew Mary Huet de Barochez, Stéphan Plassart, Sébastien Monnet

TL;DR

This paper addresses the challenge of estimating energy consumption and training time in Federated Learning by introducing Falafels, a discrete-simulation tool built on Simgrid. Falafels models computation and communication costs with a fast, energy-aware simulator, providing nearly instant feedback to guide design choices. Its contributions include extensible FL modelling with multiple topologies and FSM-based learning algorithms, integration of a Simgrid-based energy model, and an evolutionary optimization workflow. The results demonstrate fast exploration of configurations and energy reductions under certain heterogeneous and asynchronous setups, and the authors position Falafels as a complementary tool to existing measurement and experimental frameworks.

Abstract

The growth in computational power and data hungriness of Machine Learning has led to an important shift of research efforts towards the distribution of ML models on multiple machines, leading in even more powerful models. However, there exists many Distributed Artificial Intelligence paradigms and for each of them the platform and algorithm configurations play an important role in terms of training time and energy consumption. Many mathematical models and frameworks can respectively predict and benchmark this energy consumption, nonetheless, the former lacks of realism and extensibility while the latter suffers high run-times and actual power consumption. In this article, we introduce Falafels, an extensible tool that predicts the energy consumption and training time of -but not limited to -Federated Learning systems. It distinguishes itself with its discrete-simulatorbased solution leading to nearly instant run-time and fast development of new algorithms. Furthermore, we show this approach permits the use of an evolutionary algorithm providing the ability to optimize the system configuration for a given machine learning workload.

Falafels: A tool for Estimating Federated Learning Energy Consumption via Discrete Simulation

TL;DR

This paper addresses the challenge of estimating energy consumption and training time in Federated Learning by introducing Falafels, a discrete-simulation tool built on Simgrid. Falafels models computation and communication costs with a fast, energy-aware simulator, providing nearly instant feedback to guide design choices. Its contributions include extensible FL modelling with multiple topologies and FSM-based learning algorithms, integration of a Simgrid-based energy model, and an evolutionary optimization workflow. The results demonstrate fast exploration of configurations and energy reductions under certain heterogeneous and asynchronous setups, and the authors position Falafels as a complementary tool to existing measurement and experimental frameworks.

Abstract

The growth in computational power and data hungriness of Machine Learning has led to an important shift of research efforts towards the distribution of ML models on multiple machines, leading in even more powerful models. However, there exists many Distributed Artificial Intelligence paradigms and for each of them the platform and algorithm configurations play an important role in terms of training time and energy consumption. Many mathematical models and frameworks can respectively predict and benchmark this energy consumption, nonetheless, the former lacks of realism and extensibility while the latter suffers high run-times and actual power consumption. In this article, we introduce Falafels, an extensible tool that predicts the energy consumption and training time of -but not limited to -Federated Learning systems. It distinguishes itself with its discrete-simulatorbased solution leading to nearly instant run-time and fast development of new algorithms. Furthermore, we show this approach permits the use of an evolutionary algorithm providing the ability to optimize the system configuration for a given machine learning workload.

Paper Structure

This paper contains 9 sections, 7 figures.

Figures (7)

  • Figure 1: Implemented network topologies. In a star, each trainer T$_x$ is connected to a central aggregator that orchestrates the training. The ring is unidirectional, packets flow in only one direction and multiple aggregators A$_y$ can be used. Hierarchical topology allows the connection of multiple subclusters to the central aggregator via hierarchical aggregators HA$_z$.
  • Figure 2: Automaton representing the simple aggregator algorithm.
  • Figure 3: Example of the network manager automaton for the star topology.
  • Figure 4: Class diagram of the simulator architecture.
  • Figure 5: Execution environment of the simulator in Simgrid. The actual simulations contains multiples hosts (with two actors in each), however only one host have been represented here for simplicity.
  • ...and 2 more figures