Table of Contents
Fetching ...

One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity

Naibo Wang, Yuchen Deng, Wenjie Feng, Shichen Fan, Jianwei Yin, See-Kiong Ng

TL;DR

The paper addresses non-IID challenges in one-shot sequential federated learning by introducing FedELMY, which builds a per-client model pool and uses two distance-based regularizers to diversify locally trained models. By initializing new models from pool averages and optimizing with a loss that includes $-\alpha d_1 + \beta d_2$, FedELMY enhances knowledge transfer across adjacent clients while containing drift from the global solution. Empirical results show FedELMY outperforms both one-shot PFL and SFL baselines on label-skew and domain-shift tasks, with notable gains on CIFAR-10 and PACS, and maintains low communication cost due to a single round of neighbor exchanges. The work demonstrates substantial practical impact for scalable, privacy-preserving FL in heterogeneous data settings, supported by robust ablations and case studies illustrating improved feature representations and diverse local models.

Abstract

Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL settings, exacerbated by the restricted communication between clients. In this paper, we improve the one-shot sequential federated learning for non-IID data by proposing a local model diversity-enhancing strategy. Specifically, to leverage the potential of local model diversity for improving model performance, we introduce a local model pool for each client that comprises diverse models generated during local training, and propose two distance measurements to further enhance the model diversity and mitigate the effect of non-IID data. Consequently, our proposed framework can improve the global model performance while maintaining low communication costs. Extensive experiments demonstrate that our method exhibits superior performance to existing one-shot PFL methods and achieves better accuracy compared with state-of-the-art one-shot SFL methods on both label-skew and domain-shift tasks (e.g., 6%+ accuracy improvement on the CIFAR-10 dataset).

One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity

TL;DR

The paper addresses non-IID challenges in one-shot sequential federated learning by introducing FedELMY, which builds a per-client model pool and uses two distance-based regularizers to diversify locally trained models. By initializing new models from pool averages and optimizing with a loss that includes , FedELMY enhances knowledge transfer across adjacent clients while containing drift from the global solution. Empirical results show FedELMY outperforms both one-shot PFL and SFL baselines on label-skew and domain-shift tasks, with notable gains on CIFAR-10 and PACS, and maintains low communication cost due to a single round of neighbor exchanges. The work demonstrates substantial practical impact for scalable, privacy-preserving FL in heterogeneous data settings, supported by robust ablations and case studies illustrating improved feature representations and diverse local models.

Abstract

Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL settings, exacerbated by the restricted communication between clients. In this paper, we improve the one-shot sequential federated learning for non-IID data by proposing a local model diversity-enhancing strategy. Specifically, to leverage the potential of local model diversity for improving model performance, we introduce a local model pool for each client that comprises diverse models generated during local training, and propose two distance measurements to further enhance the model diversity and mitigate the effect of non-IID data. Consequently, our proposed framework can improve the global model performance while maintaining low communication costs. Extensive experiments demonstrate that our method exhibits superior performance to existing one-shot PFL methods and achieves better accuracy compared with state-of-the-art one-shot SFL methods on both label-skew and domain-shift tasks (e.g., 6%+ accuracy improvement on the CIFAR-10 dataset).
Paper Structure (31 sections, 9 equations, 12 figures, 9 tables, 3 algorithms)

This paper contains 31 sections, 9 equations, 12 figures, 9 tables, 3 algorithms.

Figures (12)

  • Figure 1: Two federated learning settings.
  • Figure 2: An illustration of our training solution on client $i$. Based on previously trained models in the model pool, every new model $m_k$ starts training from $\theta_k^s=f(\{\theta_i\}_{i=0}^{k-1})$ to improve training diversity ($f$ is the average function in our paper). During training, optimization of $\theta_k$ is constrained within a specific region (the non-shadow areas). $\theta_k$ is required to maintain a certain distance ($d_1$) from existing models $\{\theta_i\}_{i=0}^{k-1}$ to enhance model diversity, and should not diverge significantly ($d_2$) from the initial model $\theta_0$ to prevent deviation from the globally optimal solution caused by non-IID data. After training, all trained models $\{\theta_i\}_{i=1}^{k}$ display similar training losses on the local dataset $D_i$ (a) but have different test errors on the whole test set (b). Meanwhile, the averaged model $\theta_{avg}$ of all models in the model pool achieves a lower test error than any single model (b).
  • Figure 3: Overview of our method. Every client $i$ receives a model $m_{avg}^{i-1}$ from its previous client $i-1$ and sends model $m_{avg}^{i}$ to its next client $i+1$ after training (I). For each client $i$, we train $S$ models and put them into its model pool $\mathcal{M}^i$ (II). Every new model $m^i_j$ is initialized to the average of the existing models in $\mathcal{M}^i$ and trained under the control of $d_1$ and $d_2$ (III).
  • Figure 4: Two data distributions across clients. For the label-skew distribution, the color depth of every square represents the number of samples of the corresponding class on that client; for the domain-shift (feature-skew) distribution, every client possesses a specific domain with all classes.
  • Figure 5: Communication cost comparison of different algorithms for CIFAR-10 dataset when the number of clients $N=10$ with model ResNet-18.
  • ...and 7 more figures