Parallel Split Learning with Global Sampling

Mohammad Kohankhaki; Ahmad Ayad; Mahdi Barhoush; Anke Schmeink

Parallel Split Learning with Global Sampling

Mohammad Kohankhaki, Ahmad Ayad, Mahdi Barhoush, Anke Schmeink

TL;DR

GPSL is a server-driven scheme that fixes the global batch size while computing per-client batch-size schedules using pooled-level proportions and obtains finite-population deviation guarantees via Serfling's inequality, yielding a zero rounding bias compared to local sampling schemes.

Abstract

Distributed deep learning in resource-constrained environments faces scalability and generalization challenges due to large effective batch sizes and non-identically distributed client data. We introduce a server-driven sampling strategy that maintains a fixed global batch size by dynamically adjusting client-side batch sizes. This decouples the effective batch size from the number of participating devices and ensures that global batches better reflect the overall data distribution. Using standard concentration bounds, we establish tighter deviation guarantees compared to existing approaches. Empirical results on a benchmark dataset confirm that the proposed method improves model accuracy, training efficiency, and convergence stability, offering a scalable solution for learning at the network edge.

Parallel Split Learning with Global Sampling

TL;DR

Abstract

Paper Structure (17 sections, 12 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 12 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
System Model
Parallel Split Learning with Global Sampling
Deviation Analysis
Comparison with Fixed Local Batches
Simulation Results
Experimental Setup
Sampling Methods
Non-IID Data
Global Batch Size
Batch Deviation
Runtime
Limitations
...and 2 more sections

Figures (7)

Figure 1: GPSL schematic: variable local batch sizes sum to a fixed global batch size, mitigating large effective batch size and non-IID issues in PSL.
Figure 2: Example of a severe non-IID setting for $K=16$ clients, showing the distribution of classes across clients.
Figure 3: ResNet-18 on CIFAR-10: Test accuracy curves for each sampling method in a severe non-IID setting with standard deviation shaded.
Figure 4: ResNet-18 on CIFAR-10: Batch deviation curves for each sampling method in a severe non-IID setting with exponential moving average smoothing and shaded standard deviation.
Figure 5: ResNet-18 on CIFAR-10: Average total training time for each sampling method in a severe non-IID setting, measured in minutes.
...and 2 more figures

Parallel Split Learning with Global Sampling

TL;DR

Abstract

Parallel Split Learning with Global Sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (7)