Flexible Clustered Federated Learning for Client-Level Data Distribution Shift
Moming Duan, Duo Liu, Xinyuan Ji, Yu Wu, Liang Liang, Xianzhang Chen, Yujuan Tan
TL;DR
FlexCFL tackles the core FL challenge of statistical heterogeneity by introducing static, direction-based clustering of clients via Euclidean distance on decomposed cosine similarities (EDC), paired with a lightweight newcomer cold-start and a Wasserstein-distance–driven client migration to cope with distribution shifts. The framework uses an auxiliary server to manage groups, enabling three transmissions: intra-group aggregation, inter-group aggregation, and gradient uploads, and supports a semi-pluralistic setup through a tunable inter-group learning rate $\eta_g$. The authors provide a convergence analysis under standard convexity and Lipschitz conditions, and empirically demonstrate substantial accuracy gains over FedAvg, FedProx, IFCA, FeSEM, and FedGroup across MNIST, FEMNIST, Synthetic, and FashionMNIST, including robustness to client-level distribution shifts. This work achieves higher accuracy while maintaining communication efficiency and scalability, and it contributes an open-source implementation to facilitate adoption in real-world large-scale FL systems.
Abstract
Federated Learning (FL) enables the multiple participating devices to collaboratively contribute to a global neural network model while keeping the training data locally. Unlike the centralized training setting, the non-IID, imbalanced (statistical heterogeneity) and distribution shifted training data of FL is distributed in the federated network, which will increase the divergences between the local models and the global model, further degrading performance. In this paper, we propose a flexible clustered federated learning (CFL) framework named FlexCFL, in which we 1) group the training of clients based on the similarities between the clients' optimization directions for lower training divergence; 2) implement an efficient newcomer device cold start mechanism for framework scalability and practicality; 3) flexibly migrate clients to meet the challenge of client-level data distribution shift. FlexCFL can achieve improvements by dividing joint optimization into groups of sub-optimization and can strike a balance between accuracy and communication efficiency in the distribution shift environment. The convergence and complexity are analyzed to demonstrate the efficiency of FlexCFL. We also evaluate FlexCFL on several open datasets and made comparisons with related CFL frameworks. The results show that FlexCFL can significantly improve absolute test accuracy by +10.6% on FEMNIST compared to FedAvg, +3.5% on FashionMNIST compared to FedProx, +8.4% on MNIST compared to FeSEM. The experiment results show that FlexCFL is also communication efficient in the distribution shift environment.
