CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated Learning
Pengyu Zhang, Yingbo Zhou, Ming Hu, Xian Wei, Mingsong Chen
TL;DR
CyclicFL tackles slow convergence and degraded accuracy in federated learning under non-IID data by introducing cyclic pre-training on selected AIoT devices to derive a strong initial global model without exposing local data. It formalizes a two-phase workflow where cyclic pre-training optimizes a task-consistent objective $\mathcal{F}(\mathbf{w})$ starting from a random $\mathbf{w}_{rg}$ to obtain $\mathbf{w}_{wg}$, which then seeds standard FL training. The paper proves data-consistency enhances Lipschitzness of the loss and provides phase-wise convergence rates under $L$-smoothness across strongly convex, convex, and non-convex regimes, showing accelerated convergence. Empirical results across FEMNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 demonstrate up to $14.11$ percentage-point gains in maximum accuracy and substantially faster convergence, while maintaining privacy and compatibility with baseline FL methods. Overall, CyclicFL offers a practical, privacy-preserving path to faster and more accurate FL on security-critical AIoT deployments.
Abstract
Federated learning (FL) has been proposed to enable distributed learning on Artificial Intelligence Internet of Things (AIoT) devices with guarantees of high-level data privacy. Since random initial models in FL can easily result in unregulated Stochastic Gradient Descent (SGD) processes, existing FL methods greatly suffer from both slow convergence and poor accuracy, especially in non-IID scenarios. To address this problem, we propose a novel method named CyclicFL, which can quickly derive effective initial models to guide the SGD processes, thus improving the overall FL training performance. We formally analyze the significance of data consistency between the pre-training and training stages of CyclicFL, showing the limited Lipschitzness of loss for the pre-trained models by CyclicFL. Moreover, we systematically prove that our method can achieve faster convergence speed under various convexity assumptions. Unlike traditional centralized pre-training methods that require public proxy data, CyclicFL pre-trains initial models on selected AIoT devices cyclically without exposing their local data. Therefore, they can be easily integrated into any security-critical FL methods. Comprehensive experimental results show that CyclicFL can not only improve the maximum classification accuracy by up to $14.11\%$ but also significantly accelerate the overall FL training process.
