Table of Contents
Fetching ...

Don't Reach for the Stars: Rethinking Topology for Resilient Federated Learning

Mirko Konstantin, Anirban Mukhopadhyay

TL;DR

The paper tackles the vulnerability of centralized star-topology FL to non-IID data and malfunctioning clients by proposing LIGHTYEAR, a decentralized P2P FL framework. Each client computes an agreement score on its private validation set to select a personalized aggregation subset from neighbors and updates its model via a regularized aggregation rule $\bar{\theta_i}^{(t+1)} = \bar{\theta_i}^{(t)} + \gamma^{t} \cdot \frac{1}{|\mathcal{S}_i|} \sum_{j \in \mathcal{S}_i} (\theta_j - \bar{\theta_i}^{(t)})$, with the aggregation guided by the agreement metric $A_{ij}$ that fuses accuracy, calibration, and confidence (or Dice for segmentation). The approach formalizes problem statements under non-exchangeable data, decomposes error into target-domain and corruption components, and demonstrates through five medical datasets that LIGHTYEAR delivers robust, personalized performance superior to both centralized baselines and existing P2P methods, including under adversarial and dynamic malfunction scenarios. The work highlights the practical impact of decoupling global coordination from local validation-driven aggregation to enhance resilience and personalization in federated learning. Overall, it argues for embracing decentralized architectures to improve reliability and domain-adaptive performance in real-world FL deployments.

Abstract

Federated learning (FL) enables collaborative model training across distributed clients while preserving data privacy by keeping data local. Traditional FL approaches rely on a centralized, star-shaped topology, where a central server aggregates model updates from clients. However, this architecture introduces several limitations, including a single point of failure, limited personalization, and poor robustness to distribution shifts or vulnerability to malfunctioning clients. Moreover, update selection in centralized FL often relies on low-level parameter differences, which can be unreliable when client data is not independent and identically distributed, and offer clients little control. In this work, we propose a decentralized, peer-to-peer (P2P) FL framework. It leverages the flexibility of the P2P topology to enable each client to identify and aggregate a personalized set of trustworthy and beneficial updates.This framework is the Local Inference Guided Aggregation for Heterogeneous Training Environments to Yield Enhancement Through Agreement and Regularization (LIGHTYEAR). Central to our method is an agreement score, computed on a local validation set, which quantifies the semantic alignment of incoming updates in the function space with respect to the clients reference model. Each client uses this score to select a tailored subset of updates and performs aggregation with a regularization term that further stabilizes the training. Our empirical evaluation across five datasets shows that the proposed approach consistently outperforms both, centralized baselines and existing P2P methods in terms of client-level performance, particularly under adversarial and heterogeneous conditions.

Don't Reach for the Stars: Rethinking Topology for Resilient Federated Learning

TL;DR

The paper tackles the vulnerability of centralized star-topology FL to non-IID data and malfunctioning clients by proposing LIGHTYEAR, a decentralized P2P FL framework. Each client computes an agreement score on its private validation set to select a personalized aggregation subset from neighbors and updates its model via a regularized aggregation rule , with the aggregation guided by the agreement metric that fuses accuracy, calibration, and confidence (or Dice for segmentation). The approach formalizes problem statements under non-exchangeable data, decomposes error into target-domain and corruption components, and demonstrates through five medical datasets that LIGHTYEAR delivers robust, personalized performance superior to both centralized baselines and existing P2P methods, including under adversarial and dynamic malfunction scenarios. The work highlights the practical impact of decoupling global coordination from local validation-driven aggregation to enhance resilience and personalization in federated learning. Overall, it argues for embracing decentralized architectures to improve reliability and domain-adaptive performance in real-world FL deployments.

Abstract

Federated learning (FL) enables collaborative model training across distributed clients while preserving data privacy by keeping data local. Traditional FL approaches rely on a centralized, star-shaped topology, where a central server aggregates model updates from clients. However, this architecture introduces several limitations, including a single point of failure, limited personalization, and poor robustness to distribution shifts or vulnerability to malfunctioning clients. Moreover, update selection in centralized FL often relies on low-level parameter differences, which can be unreliable when client data is not independent and identically distributed, and offer clients little control. In this work, we propose a decentralized, peer-to-peer (P2P) FL framework. It leverages the flexibility of the P2P topology to enable each client to identify and aggregate a personalized set of trustworthy and beneficial updates.This framework is the Local Inference Guided Aggregation for Heterogeneous Training Environments to Yield Enhancement Through Agreement and Regularization (LIGHTYEAR). Central to our method is an agreement score, computed on a local validation set, which quantifies the semantic alignment of incoming updates in the function space with respect to the clients reference model. Each client uses this score to select a tailored subset of updates and performs aggregation with a regularization term that further stabilizes the training. Our empirical evaluation across five datasets shows that the proposed approach consistently outperforms both, centralized baselines and existing P2P methods in terms of client-level performance, particularly under adversarial and heterogeneous conditions.

Paper Structure

This paper contains 22 sections, 15 equations, 13 figures, 21 tables.

Figures (13)

  • Figure 1: Arrow colors indicate the alignment quality of updates with the rest of the federation (green: well-aligned, yellow/red: misaligned). In centralized FL, all client updates are aggregated at the server regardless of their compatibility, which can degrade performance under heterogeneity. In contrast, P2P FL enables client-side aggregation, allowing each client to select only the most compatible updates.
  • Figure 2: Illustration of the decomposition of the prediction error. The boxplot displays the error across the instances, with color indicating the magnitude of the error (green: low error, red: high error).
  • Figure 3: Illustrates the robustness of each method under three types of client malfunctions. Performance by progressively increasing the number of malfunctioning clients, ranging from 1-7 for FEMNIST and from 1-4 for Camelyon17 and 1-5 for Isic19. The reported accuracy represents the average accuracy across all clients and all experimental runs.
  • Figure 4: Illustrates the robustness of each method under three types of client malfunctions on the segmentation tasks. Performance by progressively increasing the number of malfunctioning clients, ranging from 1-4 for both datasets. The reported dice score represents the average dice across all clients and all experimental runs.
  • Figure 5: Compares the resilience of both topologies and LIGHTYEAR to three types of client malfunctions on the classification tasks, reported by the average accuracy over all clients.
  • ...and 8 more figures