One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity
Naibo Wang, Yuchen Deng, Wenjie Feng, Shichen Fan, Jianwei Yin, See-Kiong Ng
TL;DR
The paper addresses non-IID challenges in one-shot sequential federated learning by introducing FedELMY, which builds a per-client model pool and uses two distance-based regularizers to diversify locally trained models. By initializing new models from pool averages and optimizing with a loss that includes $-\alpha d_1 + \beta d_2$, FedELMY enhances knowledge transfer across adjacent clients while containing drift from the global solution. Empirical results show FedELMY outperforms both one-shot PFL and SFL baselines on label-skew and domain-shift tasks, with notable gains on CIFAR-10 and PACS, and maintains low communication cost due to a single round of neighbor exchanges. The work demonstrates substantial practical impact for scalable, privacy-preserving FL in heterogeneous data settings, supported by robust ablations and case studies illustrating improved feature representations and diverse local models.
Abstract
Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL settings, exacerbated by the restricted communication between clients. In this paper, we improve the one-shot sequential federated learning for non-IID data by proposing a local model diversity-enhancing strategy. Specifically, to leverage the potential of local model diversity for improving model performance, we introduce a local model pool for each client that comprises diverse models generated during local training, and propose two distance measurements to further enhance the model diversity and mitigate the effect of non-IID data. Consequently, our proposed framework can improve the global model performance while maintaining low communication costs. Extensive experiments demonstrate that our method exhibits superior performance to existing one-shot PFL methods and achieves better accuracy compared with state-of-the-art one-shot SFL methods on both label-skew and domain-shift tasks (e.g., 6%+ accuracy improvement on the CIFAR-10 dataset).
