Table of Contents
Fetching ...

Federated Learning Clients Clustering with Adaptation to Data Drifts

Minghao Li, Dmitrii Avdiukhin, Rana Shahout, Nikita Ivkin, Vladimir Braverman, Minlan Yu

TL;DR

This work proposes FIELDING, a CFL framework for handling diverse types of data drift with low overhead that detects drift at individual clients and performs selective re-clustering to balance cluster quality and model performance, while remaining robust to malicious clients and varying levels of heterogeneity.

Abstract

Federated Learning (FL) trains deep models across edge devices without centralizing raw data, preserving user privacy. However, client heterogeneity slows down convergence and limits global model accuracy. Clustered FL (CFL) mitigates this by grouping clients with similar representations and training a separate model for each cluster. In practice, client data evolves over time, a phenomenon we refer to as data drift, which breaks cluster homogeneity and degrades performance. Data drift can take different forms depending on whether changes occur in the output values, the input features, or the relationship between them. We propose FIELDING, a CFL framework for handling diverse types of data drift with low overhead. FIELDING detects drift at individual clients and performs selective re-clustering to balance cluster quality and model performance, while remaining robust to malicious clients and varying levels of heterogeneity. Experiments show that FIELDING improves final model accuracy by 1.9-5.9% and achieves target accuracy 1.16x-2.23x faster than existing state-of-the-art CFL methods.

Federated Learning Clients Clustering with Adaptation to Data Drifts

TL;DR

This work proposes FIELDING, a CFL framework for handling diverse types of data drift with low overhead that detects drift at individual clients and performs selective re-clustering to balance cluster quality and model performance, while remaining robust to malicious clients and varying levels of heterogeneity.

Abstract

Federated Learning (FL) trains deep models across edge devices without centralizing raw data, preserving user privacy. However, client heterogeneity slows down convergence and limits global model accuracy. Clustered FL (CFL) mitigates this by grouping clients with similar representations and training a separate model for each cluster. In practice, client data evolves over time, a phenomenon we refer to as data drift, which breaks cluster homogeneity and degrades performance. Data drift can take different forms depending on whether changes occur in the output values, the input features, or the relationship between them. We propose FIELDING, a CFL framework for handling diverse types of data drift with low overhead. FIELDING detects drift at individual clients and performs selective re-clustering to balance cluster quality and model performance, while remaining robust to malicious clients and varying levels of heterogeneity. Experiments show that FIELDING improves final model accuracy by 1.9-5.9% and achieves target accuracy 1.16x-2.23x faster than existing state-of-the-art CFL methods.

Paper Structure

This paper contains 27 sections, 4 theorems, 18 equations, 20 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

Let $N$ be the number of clients and $M$ be the total number of machines sampled per round. Let $\mathbf x^*$ be the minimizer of $f_0 = \mathop{\mathrm{avg}}\limits_i f^{(i)}_{0}$. Let $\mathbf c^{(k,*)}_{t}$ be the minimizer for cluster $k$ at iteration $t$. Then, under ass:all, for $\eta \le 1/L$

Figures (20)

  • Figure 1: Client heterogeneity of the global set and label distribution-based clusters.
  • Figure 2: Accuracy difference between different re-clustering approaches.
  • Figure 3: Clustering overview of Fielding with label distribution as client representation. Clients send distribution vectors to the coordinator; the coordinator moves drifted clients to the closest cluster and triggers global re-clustering if any cluster center shifts by a distance larger than $\tau$.
  • Figure 4: Time to accuracy (TTA) comparison over four tasks.
  • Figure 5: Fielding with different client selection strategies on FMoW dataset.
  • ...and 15 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 4
  • proof