Table of Contents
Fetching ...

FBFL: A Field-Based Coordination Approach for Data Heterogeneity in Federated Learning

Davide Domini, Gianluca Aguzzi, Lukas Esterle, Mirko Viroli

TL;DR

FBFL tackles data heterogeneity and resilience in federated learning by introducing field-based coordination to create self-organizing, spatially localized learning regions with distributed leaders. The approach formalizes regional objectives and leverages SCR patterns to enable dynamic leader election and hierarchical aggregation, avoiding a central server. Empirical results on MNIST, FashionMNIST, and Extended MNIST show that FBFL matches FedAvg under IID data and significantly outperforms FedAvg, FedProx, and Scaffold under non-IID conditions, while demonstrating resilience to aggregator failures. The work advances decentralized FL by combining personalized regional models with robust, self-stabilizing coordination, offering scalable and privacy-preserving learning suitable for edge and IoT deployments.

Abstract

In the last years, Federated learning (FL) has become a popular solution to train machine learning models in domains with high privacy concerns. However, FL scalability and performance face significant challenges in real-world deployments where data across devices are non-independently and identically distributed (non-IID). The heterogeneity in data distribution frequently arises from spatial distribution of devices, leading to degraded model performance in the absence of proper handling. Additionally, FL typical reliance on centralized architectures introduces bottlenecks and single-point-of-failure risks, particularly problematic at scale or in dynamic environments. To close this gap, we propose Field-Based Federated Learning (FBFL), a novel approach leveraging macroprogramming and field coordination to address these limitations through: (i) distributed spatial-based leader election for personalization to mitigate non-IID data challenges; and (ii) construction of a self-organizing, hierarchical architecture using advanced macroprogramming patterns. Moreover, FBFL not only overcomes the aforementioned limitations, but also enables the development of more specialized models tailored to the specific data distribution in each subregion. This paper formalizes FBFL and evaluates it extensively using MNIST, FashionMNIST, and Extended MNIST datasets. We demonstrate that, when operating under IID data conditions, FBFL performs comparably to the widely-used FedAvg algorithm. Furthermore, in challenging non-IID scenarios, FBFL not only outperforms FedAvg but also surpasses other state-of-the-art methods, namely FedProx and Scaffold, which have been specifically designed to address non-IID data distributions. Additionally, we showcase the resilience of FBFL's self-organizing hierarchical architecture against server failures.

FBFL: A Field-Based Coordination Approach for Data Heterogeneity in Federated Learning

TL;DR

FBFL tackles data heterogeneity and resilience in federated learning by introducing field-based coordination to create self-organizing, spatially localized learning regions with distributed leaders. The approach formalizes regional objectives and leverages SCR patterns to enable dynamic leader election and hierarchical aggregation, avoiding a central server. Empirical results on MNIST, FashionMNIST, and Extended MNIST show that FBFL matches FedAvg under IID data and significantly outperforms FedAvg, FedProx, and Scaffold under non-IID conditions, while demonstrating resilience to aggregator failures. The work advances decentralized FL by combining personalized regional models with robust, self-stabilizing coordination, offering scalable and privacy-preserving learning suitable for edge and IoT deployments.

Abstract

In the last years, Federated learning (FL) has become a popular solution to train machine learning models in domains with high privacy concerns. However, FL scalability and performance face significant challenges in real-world deployments where data across devices are non-independently and identically distributed (non-IID). The heterogeneity in data distribution frequently arises from spatial distribution of devices, leading to degraded model performance in the absence of proper handling. Additionally, FL typical reliance on centralized architectures introduces bottlenecks and single-point-of-failure risks, particularly problematic at scale or in dynamic environments. To close this gap, we propose Field-Based Federated Learning (FBFL), a novel approach leveraging macroprogramming and field coordination to address these limitations through: (i) distributed spatial-based leader election for personalization to mitigate non-IID data challenges; and (ii) construction of a self-organizing, hierarchical architecture using advanced macroprogramming patterns. Moreover, FBFL not only overcomes the aforementioned limitations, but also enables the development of more specialized models tailored to the specific data distribution in each subregion. This paper formalizes FBFL and evaluates it extensively using MNIST, FashionMNIST, and Extended MNIST datasets. We demonstrate that, when operating under IID data conditions, FBFL performs comparably to the widely-used FedAvg algorithm. Furthermore, in challenging non-IID scenarios, FBFL not only outperforms FedAvg but also surpasses other state-of-the-art methods, namely FedProx and Scaffold, which have been specifically designed to address non-IID data distributions. Additionally, we showcase the resilience of FBFL's self-organizing hierarchical architecture against server failures.

Paper Structure

This paper contains 44 sections, 10 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: On the left, a visual representation of centralized federated learning. In the first phase, the server shares the centralized model with the clients. In the second phase, the clients perform a local learning phase using data that is not accessible to the server. In the third phase, these models are communicated back to the central server, and finally, in the last phase, there is an aggregation algorithm. On the right, the P2P federated learning schema. Differently from the centralized version, there is no central server. Each client sends its local model to all the other clients, then the aggregation is performed locally.
  • Figure 2: Graphical representation of the Self-organizing Coordination Regions pattern. First, information within each area is collected in the respective leader. Then, each leader processes the collected information and shares it back to the clients.
  • Figure 3: Spatial data distribution: homogeneous within subregions, non-IID across subregions.
  • Figure 4: A graphical representation of three different data distribution in 5 subregions. Each color represents a different subregion. The second and the third images are two examples of non-IID data.
  • Figure 5: Comparison of the proposed method (FBFL) with FedAvg under IID data. The first row represents results on the MNIST dataset, while the second row on the Fashion MNIST dataset.
  • ...and 3 more figures