Table of Contents
Fetching ...

CycleSL: Server-Client Cyclical Update Driven Scalable Split Learning

Mengdi Wang, Efe Bozkir, Enkelejda Kasneci

TL;DR

CycleSL addresses scalability and drift in split learning by reframing server-side training as a standalone task and using feature resampling to mitigate heterogeneity. It employs a server-client cyclical update protocol inspired by block coordinate descent, updating the server first and then computing client gradients with the updated server. This aggregation-free approach can be combined with existing scalable SL methods (e.g., PSL, SFL, SGLR) to improve performance with lower server resource burden. Empirical results across five noniid datasets with partial participation show substantial accuracy and convergence gains, suggesting practical benefits for cross-device and privacy-preserving collaborative learning.

Abstract

Split learning emerges as a promising paradigm for collaborative distributed model training, akin to federated learning, by partitioning neural networks between clients and a server without raw data exchange. However, sequential split learning suffers from poor scalability, while parallel variants like parallel split learning and split federated learning often incur high server resource overhead due to model duplication and aggregation, and generally exhibit reduced model performance and convergence owing to factors like client drift and lag. To address these limitations, we introduce CycleSL, a novel aggregation-free split learning framework that enhances scalability and performance and can be seamlessly integrated with existing methods. Inspired by alternating block coordinate descent, CycleSL treats server-side training as an independent higher-level machine learning task, resampling client-extracted features (smashed data) to mitigate heterogeneity and drift. It then performs cyclical updates, namely optimizing the server model first, followed by client updates using the updated server for gradient computation. We integrate CycleSL into previous algorithms and benchmark them on five publicly available datasets with non-iid data distribution and partial client attendance. Our empirical findings highlight the effectiveness of CycleSL in enhancing model performance. Our source code is available at https://gitlab.lrz.de/hctl/CycleSL.

CycleSL: Server-Client Cyclical Update Driven Scalable Split Learning

TL;DR

CycleSL addresses scalability and drift in split learning by reframing server-side training as a standalone task and using feature resampling to mitigate heterogeneity. It employs a server-client cyclical update protocol inspired by block coordinate descent, updating the server first and then computing client gradients with the updated server. This aggregation-free approach can be combined with existing scalable SL methods (e.g., PSL, SFL, SGLR) to improve performance with lower server resource burden. Empirical results across five noniid datasets with partial participation show substantial accuracy and convergence gains, suggesting practical benefits for cross-device and privacy-preserving collaborative learning.

Abstract

Split learning emerges as a promising paradigm for collaborative distributed model training, akin to federated learning, by partitioning neural networks between clients and a server without raw data exchange. However, sequential split learning suffers from poor scalability, while parallel variants like parallel split learning and split federated learning often incur high server resource overhead due to model duplication and aggregation, and generally exhibit reduced model performance and convergence owing to factors like client drift and lag. To address these limitations, we introduce CycleSL, a novel aggregation-free split learning framework that enhances scalability and performance and can be seamlessly integrated with existing methods. Inspired by alternating block coordinate descent, CycleSL treats server-side training as an independent higher-level machine learning task, resampling client-extracted features (smashed data) to mitigate heterogeneity and drift. It then performs cyclical updates, namely optimizing the server model first, followed by client updates using the updated server for gradient computation. We integrate CycleSL into previous algorithms and benchmark them on five publicly available datasets with non-iid data distribution and partial client attendance. Our empirical findings highlight the effectiveness of CycleSL in enhancing model performance. Our source code is available at https://gitlab.lrz.de/hctl/CycleSL.

Paper Structure

This paper contains 32 sections, 5 equations, 13 figures, 14 tables, 1 algorithm.

Figures (13)

  • Figure 1: The CycleSL pipeline. After collecting smashed data from clients, CycleSL first forms a global feature dataset on the server side. Then CycleSL resamples features from the dataset to train the server model. Only after the server model is updated, the original feature batches are reused to compute gradients using the latest server model. Lastly, the gradients are sent back to clients for their local update.
  • Figure 2: Histograms of samples per user for the FEMNIST, CelebA, Shakespeare, and OpenEDS2020 datasets.
  • Figure 3: Label distributions among clients in CIFAR-100 (smaller $\alpha$ implies stronger data heterogeneity).
  • Figure 4: Test metrics for the FEMNIST task.
  • Figure 5: Test metrics for the CelebA task.
  • ...and 8 more figures