Table of Contents
Fetching ...

ParallelSFL: A Novel Split Federated Learning Framework Tackling Heterogeneity Issues

Yunming Liao, Yang Xu, Hongli Xu, Zhiwei Yao, Liusheng Huang, Chunming Qiao

TL;DR

ParallelSFL tackles the dual challenges of communication bottlenecks and heterogeneity in edge federated learning by partitioning workers into clusters and performing split-FL within each cluster, with a KL-divergence–based strategy to steer data distributions toward IID. A four-module PS-side design enables worker state monitoring, clustering, updating-frequency optimization, and adaptive full-model aggregation to balance efficiency and accuracy. Empirical results on an 80-device edge platform show at least 21% traffic reduction, 1.36x faster training, and at least 5% accuracy gains in heterogeneous scenarios compared to baselines. The approach enables scalable, efficient large-model training on resource-constrained edge devices, with practical implications for deploying advanced AI at the edge.

Abstract

Mobile devices contribute more than half of the world's web traffic, providing massive and diverse data for powering various federated learning (FL) applications. In order to avoid the communication bottleneck on the parameter server (PS) and accelerate the training of large-scale models on resourceconstraint workers in edge computing (EC) system, we propose a novel split federated learning (SFL) framework, termed ParallelSFL. Concretely, we split an entire model into a bottom submodel and a top submodel, and divide participating workers into multiple clusters, each of which collaboratively performs the SFL training procedure and exchanges entire models with the PS. However, considering the statistical and system heterogeneity in edge systems, it is challenging to arrange suitable workers to specific clusters for efficient model training. To address these challenges, we carefully develop an effective clustering strategy by optimizing a utility function related to training efficiency and model accuracy. Specifically, ParallelSFL partitions workers into different clusters under the heterogeneity restrictions, thereby promoting model accuracy as well as training efficiency. Meanwhile, ParallelSFL assigns diverse and appropriate local updating frequencies for each cluster to further address system heterogeneity. Extensive experiments are conducted on a physical platform with 80 NVIDIA Jetson devices, and the experimental results show that ParallelSFL can reduce the traffic consumption by at least 21%, speed up the model training by at least 1.36x, and improve model accuracy by at least 5% in heterogeneous scenarios, compared to the baselines.

ParallelSFL: A Novel Split Federated Learning Framework Tackling Heterogeneity Issues

TL;DR

ParallelSFL tackles the dual challenges of communication bottlenecks and heterogeneity in edge federated learning by partitioning workers into clusters and performing split-FL within each cluster, with a KL-divergence–based strategy to steer data distributions toward IID. A four-module PS-side design enables worker state monitoring, clustering, updating-frequency optimization, and adaptive full-model aggregation to balance efficiency and accuracy. Empirical results on an 80-device edge platform show at least 21% traffic reduction, 1.36x faster training, and at least 5% accuracy gains in heterogeneous scenarios compared to baselines. The approach enables scalable, efficient large-model training on resource-constrained edge devices, with practical implications for deploying advanced AI at the edge.

Abstract

Mobile devices contribute more than half of the world's web traffic, providing massive and diverse data for powering various federated learning (FL) applications. In order to avoid the communication bottleneck on the parameter server (PS) and accelerate the training of large-scale models on resourceconstraint workers in edge computing (EC) system, we propose a novel split federated learning (SFL) framework, termed ParallelSFL. Concretely, we split an entire model into a bottom submodel and a top submodel, and divide participating workers into multiple clusters, each of which collaboratively performs the SFL training procedure and exchanges entire models with the PS. However, considering the statistical and system heterogeneity in edge systems, it is challenging to arrange suitable workers to specific clusters for efficient model training. To address these challenges, we carefully develop an effective clustering strategy by optimizing a utility function related to training efficiency and model accuracy. Specifically, ParallelSFL partitions workers into different clusters under the heterogeneity restrictions, thereby promoting model accuracy as well as training efficiency. Meanwhile, ParallelSFL assigns diverse and appropriate local updating frequencies for each cluster to further address system heterogeneity. Extensive experiments are conducted on a physical platform with 80 NVIDIA Jetson devices, and the experimental results show that ParallelSFL can reduce the traffic consumption by at least 21%, speed up the model training by at least 1.36x, and improve model accuracy by at least 5% in heterogeneous scenarios, compared to the baselines.
Paper Structure (22 sections, 18 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 22 sections, 18 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of FL and ParallelSFL.
  • Figure 2: Illustration of typical SFL and ParallelSFL.
  • Figure 3: Overview of ParallelSFL.
  • Figure 4: Test accuracy of five approaches on the four IID datasets.
  • Figure 5: Test accuracy of five approaches on the four non-IID datasets.
  • ...and 5 more figures