Where is the Testbed for my Federated Learning Research?

Janez Božič; Amândio R. Faustino; Boris Radovič; Marco Canini; Veljko Pejović

Where is the Testbed for my Federated Learning Research?

Janez Božič, Amândio R. Faustino, Boris Radovič, Marco Canini, Veljko Pejović

TL;DR

CoLExT presents a real-world, heterogeneous FL testbed enabling reproducible experiments across SBCs and Android devices with rich, real-time metrics. Built on Flower and deployed via Kubernetes and ADB, it supports easy porting of existing FL algorithms and provides a Grafana dashboard for comprehensive analysis. The study demonstrates low instrumentation overhead, surfaces nontrivial trade-offs (e.g., energy-to-accuracy, per-device efficiency), and identifies practical issues in implementation and stragglers, underscoring the gap between simulation and real deployment. By making CoLExT open source, the authors aim to democratize realistic FL experimentation and guide practitioners toward deployment-ready solutions.

Abstract

Progressing beyond centralized AI is of paramount importance, yet, distributed AI solutions, in particular various federated learning (FL) algorithms, are often not comprehensively assessed, which prevents the research community from identifying the most promising approaches and practitioners from being convinced that a certain solution is deployment-ready. The largest hurdle towards FL algorithm evaluation is the difficulty of conducting real-world experiments over a variety of FL client devices and different platforms, with different datasets and data distribution, all while assessing various dimensions of algorithm performance, such as inference accuracy, energy consumption, and time to convergence, to name a few. In this paper, we present CoLExT, a real-world testbed for FL research. CoLExT is designed to streamline experimentation with custom FL algorithms in a rich testbed configuration space, with a large number of heterogeneous edge devices, ranging from single-board computers to smartphones, and provides real-time collection and visualization of a variety of metrics through automatic instrumentation. According to our evaluation, porting FL algorithms to CoLExT requires minimal involvement from the developer, and the instrumentation introduces minimal resource usage overhead. Furthermore, through an initial investigation involving popular FL algorithms running on CoLExT, we reveal previously unknown trade-offs, inefficiencies, and programming bugs.

Where is the Testbed for my Federated Learning Research?

TL;DR

Abstract

Paper Structure (29 sections, 11 figures, 4 tables)

This paper contains 29 sections, 11 figures, 4 tables.

Introduction
Background and Obstacles to Realistic FL Experimentation
FL Primer and Algorithm Variations
Experimentation Challenges
Impact of Heterogeneity on FL
Experiment Orchestration and Testbed Implementation
Related Work
CoLExT: Federated Learning Testbed
Using CoLExT in a Nutshell
CoLExT Implementation
Underlying FL Framework
CoLExT Client and Server
Datasets and Data Partitioning
Collecting Performance Metrics
Experiment Orchestration
...and 14 more sections

Figures (11)

Figure 1: Max. validation accuracy and energy to accuracy (ETA) for three FL algorithms on the CIFAR-10 dataset. In FedAvg and FedProx, all clients use either a Small or a Large model, while in HeteroFL, clients use one of the two depending on the computational power. The ETA axis values cannot be (reliably) assessed without real-world experimentation provided by CoLExT.
Figure 2: CoLExT workflow.
Figure 3: Example capture of CoLExT Dashboard.
Figure 4: CoLExT testbed devices.
Figure 5: CoLExT Samsung Galaxy XCover 6 Pro powered by a Monsoon PM.
...and 6 more figures

Where is the Testbed for my Federated Learning Research?

TL;DR

Abstract

Where is the Testbed for my Federated Learning Research?

Authors

TL;DR

Abstract

Table of Contents

Figures (11)