Table of Contents
Fetching ...

A Framework for testing Federated Learning algorithms using an edge-like environment

Felipe Machado Schwanck, Marcos Tomazzoli Leipnitz, Joel Luís Carbonera, Juliano Araujo Wickboldt

TL;DR

The paper tackles the challenge of evaluating Federated Learning (FL) algorithms under data heterogeneity in distributed edge-like environments. It proposes a three-layer conceptual framework that decouples infrastructure, resources, and applications, enabling easy, configurable FL testing and observability. A Proof-of-Concept implemented on Kubernetes demonstrates configurable data distributions, aggregation strategies, and end-to-end monitoring with Flower, PyTorch, Prometheus, Grafana, and Rook Ceph. Three experiments across FMNIST, CIFAR-10, and CIFAR-100 illustrate the framework’s ability to study non-IID effects and resource usage while highlighting practical considerations and limitations. The work lays a foundation for scalable, extensible FL testing and points to future enhancements such as automated visualizations and chaos engineering to improve reliability in edge deployments.

Abstract

Federated Learning (FL) is a machine learning paradigm in which many clients cooperatively train a single centralized model while keeping their data private and decentralized. FL is commonly used in edge computing, which involves placing computer workloads (both hardware and software) as close as possible to the edge, where the data is being created and where actions are occurring, enabling faster response times, greater data privacy, and reduced data transfer costs. However, due to the heterogeneous data distributions/contents of clients, it is non-trivial to accurately evaluate the contributions of local models in global centralized model aggregation. This is an example of a major challenge in FL, commonly known as data imbalance or class imbalance. In general, testing and assessing FL algorithms can be a very difficult and complex task due to the distributed nature of the systems. In this work, a framework is proposed and implemented to assess FL algorithms in a more easy and scalable way. This framework is evaluated over a distributed edge-like environment managed by a container orchestration platform (i.e. Kubernetes).

A Framework for testing Federated Learning algorithms using an edge-like environment

TL;DR

The paper tackles the challenge of evaluating Federated Learning (FL) algorithms under data heterogeneity in distributed edge-like environments. It proposes a three-layer conceptual framework that decouples infrastructure, resources, and applications, enabling easy, configurable FL testing and observability. A Proof-of-Concept implemented on Kubernetes demonstrates configurable data distributions, aggregation strategies, and end-to-end monitoring with Flower, PyTorch, Prometheus, Grafana, and Rook Ceph. Three experiments across FMNIST, CIFAR-10, and CIFAR-100 illustrate the framework’s ability to study non-IID effects and resource usage while highlighting practical considerations and limitations. The work lays a foundation for scalable, extensible FL testing and points to future enhancements such as automated visualizations and chaos engineering to improve reliability in edge deployments.

Abstract

Federated Learning (FL) is a machine learning paradigm in which many clients cooperatively train a single centralized model while keeping their data private and decentralized. FL is commonly used in edge computing, which involves placing computer workloads (both hardware and software) as close as possible to the edge, where the data is being created and where actions are occurring, enabling faster response times, greater data privacy, and reduced data transfer costs. However, due to the heterogeneous data distributions/contents of clients, it is non-trivial to accurately evaluate the contributions of local models in global centralized model aggregation. This is an example of a major challenge in FL, commonly known as data imbalance or class imbalance. In general, testing and assessing FL algorithms can be a very difficult and complex task due to the distributed nature of the systems. In this work, a framework is proposed and implemented to assess FL algorithms in a more easy and scalable way. This framework is evaluated over a distributed edge-like environment managed by a container orchestration platform (i.e. Kubernetes).
Paper Structure (44 sections, 1 equation, 12 figures, 4 tables, 2 algorithms)

This paper contains 44 sections, 1 equation, 12 figures, 4 tables, 2 algorithms.

Figures (12)

  • Figure 1: High-level overview of the conceptual framework proposed
  • Figure 2: Implementation architecture with all tools used in our PoC
  • Figure 3: High-level flowchart of experiment execution in the PoC solution
  • Figure 4: Experiments output folder structure
  • Figure 5: Results from Experiment 1
  • ...and 7 more figures