Table of Contents
Fetching ...

Proximity-based Self-Federated Learning

Davide Domini, Gianluca Aguzzi, Nicolas Farabegoli, Mirko Viroli, Lukas Esterle

TL;DR

The paper tackles the scalability and data-heterogeneity challenges of traditional federated learning by proposing proximity-based self-federated learning (PSFL), a fully distributed framework that forms regionally coherent federations around leaders using geographic proximity and data similarity. PSFL leverages self-organising coordination region concepts and gradient-based dissemination within an aggregate computing context to dynamically create, train, and distribute federation-specific models without sharing raw data. Key contributions include a ds-based dissimilarity measure, a gradient-field guidance mechanism, and a four-step federation workflow (creation, collection, aggregation, distribution) implemented via space-fluid sparse choice; empirical results on Extended MNIST Letters show PSFL outperforms centralized FedAVG, especially in highly non-IID settings. The approach holds promise for edge and decentralized deployments by enabling adaptive, privacy-preserving, region-specific model learning at scale.

Abstract

In recent advancements in machine learning, federated learning allows a network of distributed clients to collaboratively develop a global model without needing to share their local data. This technique aims to safeguard privacy, countering the vulnerabilities of conventional centralized learning methods. Traditional federated learning approaches often rely on a central server to coordinate model training across clients, aiming to replicate the same model uniformly across all nodes. However, these methods overlook the significance of geographical and local data variances in vast networks, potentially affecting model effectiveness and applicability. Moreover, relying on a central server might become a bottleneck in large networks, such as the ones promoted by edge computing. Our paper introduces a novel, fully-distributed federated learning strategy called proximity-based self-federated learning that enables the self-organised creation of multiple federations of clients based on their geographic proximity and data distribution without exchanging raw data. Indeed, unlike traditional algorithms, our approach encourages clients to share and adjust their models with neighbouring nodes based on geographic proximity and model accuracy. This method not only addresses the limitations posed by diverse data distributions but also enhances the model's adaptability to different regional characteristics creating specialized models for each federation. We demonstrate the efficacy of our approach through simulations on well-known datasets, showcasing its effectiveness over the conventional centralized federated learning framework.

Proximity-based Self-Federated Learning

TL;DR

The paper tackles the scalability and data-heterogeneity challenges of traditional federated learning by proposing proximity-based self-federated learning (PSFL), a fully distributed framework that forms regionally coherent federations around leaders using geographic proximity and data similarity. PSFL leverages self-organising coordination region concepts and gradient-based dissemination within an aggregate computing context to dynamically create, train, and distribute federation-specific models without sharing raw data. Key contributions include a ds-based dissimilarity measure, a gradient-field guidance mechanism, and a four-step federation workflow (creation, collection, aggregation, distribution) implemented via space-fluid sparse choice; empirical results on Extended MNIST Letters show PSFL outperforms centralized FedAVG, especially in highly non-IID settings. The approach holds promise for edge and decentralized deployments by enabling adaptive, privacy-preserving, region-specific model learning at scale.

Abstract

In recent advancements in machine learning, federated learning allows a network of distributed clients to collaboratively develop a global model without needing to share their local data. This technique aims to safeguard privacy, countering the vulnerabilities of conventional centralized learning methods. Traditional federated learning approaches often rely on a central server to coordinate model training across clients, aiming to replicate the same model uniformly across all nodes. However, these methods overlook the significance of geographical and local data variances in vast networks, potentially affecting model effectiveness and applicability. Moreover, relying on a central server might become a bottleneck in large networks, such as the ones promoted by edge computing. Our paper introduces a novel, fully-distributed federated learning strategy called proximity-based self-federated learning that enables the self-organised creation of multiple federations of clients based on their geographic proximity and data distribution without exchanging raw data. Indeed, unlike traditional algorithms, our approach encourages clients to share and adjust their models with neighbouring nodes based on geographic proximity and model accuracy. This method not only addresses the limitations posed by diverse data distributions but also enhances the model's adaptability to different regional characteristics creating specialized models for each federation. We demonstrate the efficacy of our approach through simulations on well-known datasets, showcasing its effectiveness over the conventional centralized federated learning framework.
Paper Structure (14 sections, 7 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 7 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Federated learning schema. In the first phase, the server shares the centralized model with the clients. In the second phase, the clients perform a local learning phase using data that is not accessible to the server. In the third phase, these models are communicated back to the central server, and finally, in the last phase, there is an aggregation algorithm.
  • Figure 2: Graphical representation of the problem. In this case, there is an area $A$ divided into four sub-areas $\{a_1, a_2, a_3, a_4\}$. Each sub-area has a different data distribution (represented by the colours). The white nodes represent sensors $S$, while links between sensors represent the neighbourhood relation. During the learning process (time flows from top to bottom), the nodes begin to form various federations (represented by the colour of the nodes, while the leaders of the federations are represented by arrows), eventually reaching a stable division that matches the areas they belong to.
  • Figure 3: Overview of the proposed algorithm.
  • Figure 4: A simulation run of PSFL with $|A| = 9$. The background colour represents the different areas, while the colour of the nodes represents the federation they belong to. Leaders are represented with an inner black circle. In the first snapshot, all the nodes elect themselves as leaders, then the nodes start to form federations based on the dissimilarity metric and finally the federations are stable and aligned with the areas.
  • Figure 5: Data collected during training and validation. Each column represents a different loss threshold, while rows represent metrics explained in \ref{['sec:metrics']}. We can see that the number of areas and the loss threshold have a significant impact on the performance of the system. PSFL better performs with a higher number of areas and a lower loss threshold.
  • ...and 2 more figures