Collaborative Split Federated Learning with Parallel Training and Aggregation
Yiannis Papageorgiou, Yannis Thomas, Alexios Filippakopoulos, Ramin Khalili, Iordanis Koutsopoulos
TL;DR
This paper tackles the training delay and communication overhead in Federated Learning by introducing Collaborative Split Federated Learning (C-SFL), which partitions a neural network into three parts across two layers: a weak-side portion trained on computationally weak clients, an aggregator-side portion trained by local aggregators, and a server-side portion trained on the server. The architecture enables parallel FP/BP and aggregation across weak clients, local aggregators, and the server, using a local loss at the cut layer to accelerate updates without sacrificing accuracy. The two split layers, $h$ and $v$, are selected through an exhaustive search with complexity $O(V^2)$, and the authors provide a delay decomposition $D_{round}=D_0+E\cdot B\cdot(D_1+D_2)+D_3$ to quantify performance gains. Empirical results on MNIST, FMNIST, and CIFAR-10 show that C-SFL reduces training delay and communication overhead while achieving higher accuracy than standard SFL and LocSplitFed, particularly under high client heterogeneity and constrained transmission rates, indicating strong practical impact for heterogeneous edge environments.
Abstract
Federated learning (FL) operates based on model exchanges between the server and the clients, and it suffers from significant client-side computation and communication burden. Split federated learning (SFL) arises a promising solution by splitting the model into two parts, that are trained sequentially: the clients train the first part of the model (client-side model) and transmit it to the server that trains the second (server-side model). Existing SFL schemes though still exhibit long training delays and significant communication overhead, especially when clients of different computing capability participate. Thus, we propose Collaborative-Split Federated Learning~(C-SFL), a novel scheme that splits the model into three parts, namely the model parts trained at the computationally weak clients, the ones trained at the computationally strong clients, and the ones at the server. Unlike existing works, C-SFL enables parallel training and aggregation of model's parts at the clients and at the server, resulting in reduced training delays and commmunication overhead while improving the model's accuracy. Experiments verify the multiple gains of C-SFL against the existing schemes.
