SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data
Mingkun Yang, Ran Zhu, Qing Wang, Jie Yang
TL;DR
This work tackles the challenge of gradient divergence caused by non-IID data in Split Federated Learning by introducing Step-wise Momentum Fusion (SMoFi). SMoFi synchronizes momentum buffers across parallel server-side optimizers and employs a staleness-aware momentum alignment to maintain consistent training trajectories without altering client-side computation. Theoretical convergence guarantees under partial participation show an $O(1/N)$ rate with explicit bounds, and extensive experiments demonstrate up to 7.1 percentage-point accuracy gains and up to 10.25× faster convergence, especially with more clients and deeper models. Practically, SMoFi provides a lightweight, client-transparent enhancement for real-world split FL deployments in resource-constrained environments.
Abstract
Split Federated Learning is a system-efficient federated learning paradigm that leverages the rich computing resources at a central server to train model partitions. Data heterogeneity across silos, however, presents a major challenge undermining the convergence speed and accuracy of the global model. This paper introduces Step-wise Momentum Fusion (SMoFi), an effective and lightweight framework that counteracts gradient divergence arising from data heterogeneity by synchronizing the momentum buffers across server-side optimizers. To control gradient divergence over the training process, we design a staleness-aware alignment mechanism that imposes constraints on gradient updates of the server-side submodel at each optimization step. Extensive validations on multiple real-world datasets show that SMoFi consistently improves global model accuracy (up to 7.1%) and convergence speed (up to 10.25$\times$). Furthermore, SMoFi has a greater impact with more clients involved and deeper learning models, making it particularly suitable for model training in resource-constrained contexts.
