Communication-Computation Pipeline Parallel Split Learning over Wireless Edge Networks
Chenyu Liu, Zhaoyang Zhang, Zirui Chen, Zhaohui Yang
TL;DR
This work tackles the latency and privacy challenges of training neural networks in wireless edge networks by integrating pipeline parallelism into split learning, forming C^2P^2SL. It introduces a joint optimization framework that simultaneously selects the model cut layer, micro-batch count, per-user batch sizes, and TDMA time-slot allocations to minimize bubble time and maximize training efficiency. An alternating-optimization procedure decomposes the nonconvex MINLP into tractable subproblems, solved via MILP and convex optimization, and experiments show substantial reductions in training time (over 38% on average) with preserved convergence accuracy across varying bandwidths. The approach demonstrates robust performance under heterogenous UE capabilities and provides a practical blueprint for scalable, privacy-preserving edge learning with pipeline scheduling.
Abstract
Split learning (SL) offloads main computing tasks from multiple resource-constrained user equippments (UEs) to the base station (BS), while preserving local data privacy. However, its computation and communication processes remain sequential, resulting in limited system efficiency. To overcome this limitation, this paper applies pipeline parallelism (PP) of distributed training to SL in wireless networks, proposing the so-called communication-computation pipeline parallel split learning (C$^2$P$^2$SL). By considering the communicating and computing processes of UEs and BS as an overall pipeline, C$^2$P$^2$SL achieves pipeline parallelization among different micro-batches which are split from each batch of data samples. The overlap of communication and computation in this way significantly reduces the total training time. Given that training efficiency is affected by position of cutting layer and heterogeneity of the UEs, we formulate a joint optimization problem of task split and resource allocation, and design a solution based on alternating optimization. Experimental results demonstrate that C$^2$P$^2$SL significantly reduces system training time by over 38\% while maintaining convergence accuracy under different communication conditions.
