Table of Contents
Fetching ...

FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering

Md Sirajul Islam, Simin Javaherian, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng

TL;DR

The paper tackles non-IID data in federated learning by introducing FedClust, a one-shot clustered FL method that uses distances between final-layer weights to form client clusters. It constructs a proximity matrix and applies agglomerative hierarchical clustering (no proxy data required) to assign clients into an optimal number of clusters, after which cluster-specific models are trained via FedAvg. FedClust supports newcomers and offers a tunable globalization-personalization trade-off through a clustering threshold, achieving up to ~45% accuracy gains and up to ~2.7x reduction in communication rounds across four datasets. This approach significantly improves robustness to data heterogeneity and reduces communication overhead, making CFL more practical for large-scale, real-world FL deployments.

Abstract

Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative training of machine learning models over decentralized devices without exposing their local data. One of the major challenges in FL is the presence of uneven data distributions across client devices, violating the well-known assumption of independent-and-identically-distributed (IID) training samples in conventional machine learning. To address the performance degradation issue incurred by such data heterogeneity, clustered federated learning (CFL) shows its promise by grouping clients into separate learning clusters based on the similarity of their local data distributions. However, state-of-the-art CFL approaches require a large number of communication rounds to learn the distribution similarities during training until the formation of clusters is stabilized. Moreover, some of these algorithms heavily rely on a predefined number of clusters, thus limiting their flexibility and adaptability. In this paper, we propose {\em FedClust}, a novel approach for CFL that leverages the correlation between local model weights and the data distribution of clients. {\em FedClust} groups clients into clusters in a one-shot manner by measuring the similarity degrees among clients based on the strategically selected partial weights of locally trained models. We conduct extensive experiments on four benchmark datasets with different non-IID data settings. Experimental results demonstrate that {\em FedClust} achieves higher model accuracy up to $\sim$45\% as well as faster convergence with a significantly reduced communication cost up to 2.7$\times$ compared to its state-of-the-art counterparts.

FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering

TL;DR

The paper tackles non-IID data in federated learning by introducing FedClust, a one-shot clustered FL method that uses distances between final-layer weights to form client clusters. It constructs a proximity matrix and applies agglomerative hierarchical clustering (no proxy data required) to assign clients into an optimal number of clusters, after which cluster-specific models are trained via FedAvg. FedClust supports newcomers and offers a tunable globalization-personalization trade-off through a clustering threshold, achieving up to ~45% accuracy gains and up to ~2.7x reduction in communication rounds across four datasets. This approach significantly improves robustness to data heterogeneity and reduces communication overhead, making CFL more practical for large-scale, real-world FL deployments.

Abstract

Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative training of machine learning models over decentralized devices without exposing their local data. One of the major challenges in FL is the presence of uneven data distributions across client devices, violating the well-known assumption of independent-and-identically-distributed (IID) training samples in conventional machine learning. To address the performance degradation issue incurred by such data heterogeneity, clustered federated learning (CFL) shows its promise by grouping clients into separate learning clusters based on the similarity of their local data distributions. However, state-of-the-art CFL approaches require a large number of communication rounds to learn the distribution similarities during training until the formation of clusters is stabilized. Moreover, some of these algorithms heavily rely on a predefined number of clusters, thus limiting their flexibility and adaptability. In this paper, we propose {\em FedClust}, a novel approach for CFL that leverages the correlation between local model weights and the data distribution of clients. {\em FedClust} groups clients into clusters in a one-shot manner by measuring the similarity degrees among clients based on the strategically selected partial weights of locally trained models. We conduct extensive experiments on four benchmark datasets with different non-IID data settings. Experimental results demonstrate that {\em FedClust} achieves higher model accuracy up to 45\% as well as faster convergence with a significantly reduced communication cost up to 2.7 compared to its state-of-the-art counterparts.
Paper Structure (16 sections, 4 equations, 4 figures, 6 tables, 2 algorithms)

This paper contains 16 sections, 4 equations, 4 figures, 6 tables, 2 algorithms.

Figures (4)

  • Figure 1: Illustration of the distance matrices calculated using different layer weights, where CL indicates convolutional layer and FC indicates fully connected layer. A lighter color in the distance matrices denotes a smaller distance, i.e., the two models are more similar.
  • Figure 2: An overview of FedClust.
  • Figure 3: Test accuracy versus the number of communication rounds for Non-IID label skew of 20%. FedClust converges faster to reach target accuracy and consistently outperforms other baselines.
  • Figure 4: Test accuracy performance of FedClust versus the clustering threshold $\lambda$, and the number of suitable clusters for Non-IID label skew (20%) on CIFAR-10/100, FMNIST, and SVHN datasets. We run each experiment to obtain each point in the plots for 200 communication rounds with local epoch and local batch size of 10, and SGD local optimizer.