Table of Contents
Fetching ...

LCFed: An Efficient Clustered Federated Learning Framework for Heterogeneous Data

Yuxin Zhang, Haoyu Chen, Zheng Lin, Zhe Chen, Jin Zhao

TL;DR

LCFed addresses the challenge of data heterogeneity in federated learning by introducing model partitioning to enable simultaneous global and intra-cluster knowledge sharing. The framework splits models into embedding and decision components, aggregates a global embedding and cluster centers, and optimizes local objectives with local, cluster, and global losses. It also reduces server-side clustering cost via a PCA-based low-rank similarity representation, enabling scalable online clustering. Empirical results on MNIST, CIFAR-10, and CIFAR-100 show improved accuracy over multiple baselines and substantial clustering cost reductions, highlighting LCFed's practical impact for large-scale, non-IID FL deployments.

Abstract

Clustered federated learning (CFL) addresses the performance challenges posed by data heterogeneity in federated learning (FL) by organizing edge devices with similar data distributions into clusters, enabling collaborative model training tailored to each group. However, existing CFL approaches strictly limit knowledge sharing to within clusters, lacking the integration of global knowledge with intra-cluster training, which leads to suboptimal performance. Moreover, traditional clustering methods incur significant computational overhead, especially as the number of edge devices increases. In this paper, we propose LCFed, an efficient CFL framework to combat these challenges. By leveraging model partitioning and adopting distinct aggregation strategies for each sub-model, LCFed effectively incorporates global knowledge into intra-cluster co-training, achieving optimal training performance. Additionally, LCFed customizes a computationally efficient model similarity measurement method based on low-rank models, enabling real-time cluster updates with minimal computational overhead. Extensive experiments show that LCFed outperforms state-of-the-art benchmarks in both test accuracy and clustering computational efficiency.

LCFed: An Efficient Clustered Federated Learning Framework for Heterogeneous Data

TL;DR

LCFed addresses the challenge of data heterogeneity in federated learning by introducing model partitioning to enable simultaneous global and intra-cluster knowledge sharing. The framework splits models into embedding and decision components, aggregates a global embedding and cluster centers, and optimizes local objectives with local, cluster, and global losses. It also reduces server-side clustering cost via a PCA-based low-rank similarity representation, enabling scalable online clustering. Empirical results on MNIST, CIFAR-10, and CIFAR-100 show improved accuracy over multiple baselines and substantial clustering cost reductions, highlighting LCFed's practical impact for large-scale, non-IID FL deployments.

Abstract

Clustered federated learning (CFL) addresses the performance challenges posed by data heterogeneity in federated learning (FL) by organizing edge devices with similar data distributions into clusters, enabling collaborative model training tailored to each group. However, existing CFL approaches strictly limit knowledge sharing to within clusters, lacking the integration of global knowledge with intra-cluster training, which leads to suboptimal performance. Moreover, traditional clustering methods incur significant computational overhead, especially as the number of edge devices increases. In this paper, we propose LCFed, an efficient CFL framework to combat these challenges. By leveraging model partitioning and adopting distinct aggregation strategies for each sub-model, LCFed effectively incorporates global knowledge into intra-cluster co-training, achieving optimal training performance. Additionally, LCFed customizes a computationally efficient model similarity measurement method based on low-rank models, enabling real-time cluster updates with minimal computational overhead. Extensive experiments show that LCFed outperforms state-of-the-art benchmarks in both test accuracy and clustering computational efficiency.
Paper Structure (10 sections, 4 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 10 sections, 4 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: The LCFed framwork in heterogeneous settings. The server clusters and aggregates clients to get a global embedding and multiple cluster center models. Clients update local models by minimizing the classification error loss ($L_{\text{sup}}$), intra-clustering regularization ($L_{\text{cluster}}$), and global regularization ($L_{\text{global}}$).
  • Figure 2: The impact of regularization strengths $\mu$ and $\lambda$ on LCFed training performance.
  • Figure 3: Comparison of CFL algorithms in terms of clustering computation and communication costs across different scales.