Table of Contents
Fetching ...

Flexible Clustered Federated Learning for Client-Level Data Distribution Shift

Moming Duan, Duo Liu, Xinyuan Ji, Yu Wu, Liang Liang, Xianzhang Chen, Yujuan Tan

TL;DR

FlexCFL tackles the core FL challenge of statistical heterogeneity by introducing static, direction-based clustering of clients via Euclidean distance on decomposed cosine similarities (EDC), paired with a lightweight newcomer cold-start and a Wasserstein-distance–driven client migration to cope with distribution shifts. The framework uses an auxiliary server to manage groups, enabling three transmissions: intra-group aggregation, inter-group aggregation, and gradient uploads, and supports a semi-pluralistic setup through a tunable inter-group learning rate $\eta_g$. The authors provide a convergence analysis under standard convexity and Lipschitz conditions, and empirically demonstrate substantial accuracy gains over FedAvg, FedProx, IFCA, FeSEM, and FedGroup across MNIST, FEMNIST, Synthetic, and FashionMNIST, including robustness to client-level distribution shifts. This work achieves higher accuracy while maintaining communication efficiency and scalability, and it contributes an open-source implementation to facilitate adoption in real-world large-scale FL systems.

Abstract

Federated Learning (FL) enables the multiple participating devices to collaboratively contribute to a global neural network model while keeping the training data locally. Unlike the centralized training setting, the non-IID, imbalanced (statistical heterogeneity) and distribution shifted training data of FL is distributed in the federated network, which will increase the divergences between the local models and the global model, further degrading performance. In this paper, we propose a flexible clustered federated learning (CFL) framework named FlexCFL, in which we 1) group the training of clients based on the similarities between the clients' optimization directions for lower training divergence; 2) implement an efficient newcomer device cold start mechanism for framework scalability and practicality; 3) flexibly migrate clients to meet the challenge of client-level data distribution shift. FlexCFL can achieve improvements by dividing joint optimization into groups of sub-optimization and can strike a balance between accuracy and communication efficiency in the distribution shift environment. The convergence and complexity are analyzed to demonstrate the efficiency of FlexCFL. We also evaluate FlexCFL on several open datasets and made comparisons with related CFL frameworks. The results show that FlexCFL can significantly improve absolute test accuracy by +10.6% on FEMNIST compared to FedAvg, +3.5% on FashionMNIST compared to FedProx, +8.4% on MNIST compared to FeSEM. The experiment results show that FlexCFL is also communication efficient in the distribution shift environment.

Flexible Clustered Federated Learning for Client-Level Data Distribution Shift

TL;DR

FlexCFL tackles the core FL challenge of statistical heterogeneity by introducing static, direction-based clustering of clients via Euclidean distance on decomposed cosine similarities (EDC), paired with a lightweight newcomer cold-start and a Wasserstein-distance–driven client migration to cope with distribution shifts. The framework uses an auxiliary server to manage groups, enabling three transmissions: intra-group aggregation, inter-group aggregation, and gradient uploads, and supports a semi-pluralistic setup through a tunable inter-group learning rate . The authors provide a convergence analysis under standard convexity and Lipschitz conditions, and empirically demonstrate substantial accuracy gains over FedAvg, FedProx, IFCA, FeSEM, and FedGroup across MNIST, FEMNIST, Synthetic, and FashionMNIST, including robustness to client-level distribution shifts. This work achieves higher accuracy while maintaining communication efficiency and scalability, and it contributes an open-source implementation to facilitate adoption in real-world large-scale FL systems.

Abstract

Federated Learning (FL) enables the multiple participating devices to collaboratively contribute to a global neural network model while keeping the training data locally. Unlike the centralized training setting, the non-IID, imbalanced (statistical heterogeneity) and distribution shifted training data of FL is distributed in the federated network, which will increase the divergences between the local models and the global model, further degrading performance. In this paper, we propose a flexible clustered federated learning (CFL) framework named FlexCFL, in which we 1) group the training of clients based on the similarities between the clients' optimization directions for lower training divergence; 2) implement an efficient newcomer device cold start mechanism for framework scalability and practicality; 3) flexibly migrate clients to meet the challenge of client-level data distribution shift. FlexCFL can achieve improvements by dividing joint optimization into groups of sub-optimization and can strike a balance between accuracy and communication efficiency in the distribution shift environment. The convergence and complexity are analyzed to demonstrate the efficiency of FlexCFL. We also evaluate FlexCFL on several open datasets and made comparisons with related CFL frameworks. The results show that FlexCFL can significantly improve absolute test accuracy by +10.6% on FEMNIST compared to FedAvg, +3.5% on FashionMNIST compared to FedProx, +8.4% on MNIST compared to FeSEM. The experiment results show that FlexCFL is also communication efficient in the distribution shift environment.

Paper Structure

This paper contains 15 sections, 3 theorems, 22 equations, 6 figures, 4 tables, 3 algorithms.

Key Result

Lemma 1

Under Assumptions assu1 to assu3, the group loss function $F_g$ are convex, $M$-Lipschitz continuous, $L$-Lipschitz smooth for any $g$.

Figures (6)

  • Figure 1: A FedAvg training procedure on three non-IID MNIST datasets and one IID MNIST dataset to illustrate the effects of statistical heterogeneity on model accuracy and discrepancy. From left to right, the number of classes of training data per client increase, which means the degree of data heterogeneity decreases. The discrepancy is defined in Equation \ref{['for:discrepancy']}.
  • Figure 2: An overview of FlexCFL.
  • Figure 3: Evaluation results on MNIST ($m=3$). Top: test accuracy; Middle: weighted training loss based on $K$ selected clients; Bottom: discrepancy between selected clients and server (FedAvg and FedProx) or weighted discrepancy between selected clients and groups (FlexCFL, FlexCFL-$\eta_g$ IFCA, FeSEM). The inter-group learning rate $\eta_g=0.1$ in (a), (b), (c).
  • Figure 4: Test accuracy on MNIST ($m=3$), FEMNIST($m=5$), FashionMNIST($m=5$) with three kinds of distribution shift: Swap all (top); Swap part (middle); Incremental (bottom). The swap probability is 0.05 in the swap all and the swap part settings, 25% of data is released every 50 rounds in the incremental setting. FedGroup is the static version of FlexCFL without client migration.
  • Figure 5: Evaluation results of FlexCFL with different inter-group learning rate $\eta_g$ on FEMNIST-MLP in swap part setting.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1: Group Loss Function
  • Lemma 1
  • Definition 2: Intra-Group Gradient Divergence
  • Lemma 2: Upper bound of the divergence of $\bm{w}_{t,e}^{k,g}$
  • Theorem 1: Convergence Bound of FlexCFL without inter-group aggregation