Makespan Minimization in Split Learning: From Theory to Practice

Robert Ganian; Fionn Mc Inerney; Dimitra Tsigkari

Makespan Minimization in Split Learning: From Theory to Practice

Robert Ganian, Fionn Mc Inerney, Dimitra Tsigkari

TL;DR

The paper tackles makespan minimization in Split Learning under memory constraints, introducing SL-Makespan and GenSL-Makespan to capture client-helper scheduling with task precedences. It proves strong hardness results (including NP-hardness and inapproximability) for these problems, even under highly restricted instances, and provides a practical 5-approximation for SL-Makespan. For the more general GenSL-Makespan, it shows CH-Assign is strongly NP-hard, and proposes EquiDistributed (EquiD), an IP-guided heuristic drawing on the 5-approximation framework that performs well in experiments. Numerical evaluations on open-source and synthetic data demonstrate that EquiD achieves near-optimal makespans with substantial speedups over exact solvers and outperforms existing heuristics, highlighting its potential for scalable SL deployment in heterogeneous IoT settings.

Abstract

Split learning recently emerged as a solution for distributed machine learning with heterogeneous IoT devices, where clients can offload part of their training to computationally-powerful helpers. The core challenge in split learning is to minimize the training time by jointly devising the client-helper assignment and the schedule of tasks at the helpers. We first study the model where each helper has a memory cardinality constraint on how many clients it may be assigned, which represents the case of homogeneous tasks. Through complexity theory, we rule out exact polynomial-time algorithms and approximation schemes even for highly restricted instances of this problem. We complement these negative results with a non-trivial polynomial-time 5-approximation algorithm. Building on this, we then focus on the more general heterogeneous task setting considered by Tirana et al. [INFOCOM 2024], where helpers have memory capacity constraints and clients have variable memory costs. In this case, we prove that, unless P=NP, the problem cannot admit a polynomial-time approximation algorithm for any approximation factor. However, by adapting our aforementioned 5-approximation algorithm, we develop a novel heuristic for the heterogeneous task setting and show that it outperforms heuristics from prior works through extensive experiments.

Makespan Minimization in Split Learning: From Theory to Practice

TL;DR

Abstract

Paper Structure (12 sections, 5 theorems, 4 figures, 1 table)

This paper contains 12 sections, 5 theorems, 4 figures, 1 table.

Introduction
Our Contributions
Related Work
Problem Formulations
The Steps of SL Training
Training Makespan of a Batch
SL-Makespan: Hardness and Approximability
GenSL-Makespan: Hardness and Heuristic
Numerical Evaluations
Experimental Setup
Insights
Conclusion and Future Work

Key Result

Theorem 1

Even when restricted to instances such that $G$ is a complete bipartite graph and $r_j=\ell_j=p_{ij}'=r_j'=0$ and $p_{ij}=p_{i'j}$ for all $i,i'\in \mathcal{I}$ and $j\in \mathcal{J}$, SL-Makespan is (1) strongly -hard even if $M_1=\dots=M_I=3$ and (2)[1]-hard parameterized by $I$.

Figures (4)

Figure 1: An example of client-helper assignments and scheduling decisions. Processing tasks 1 to 5 (T1--T5) correspond to the model parts of Client 1.
Figure 2: Batch makespan (in sec) for different problem instances achieved by our algorithm EquiD, the baseline ED-FCFS, and B-G, the algorithm from tirana2024workflow. Note that our algorithm EquiD computes a smaller makespan than the other two methods in every scenario except one (ResNet101-MNIST with 75 clients and 5 helpers), where only ED-FCFS computes a slightly smaller makespan.
Figure 3: Relative difference in terms of how much larger the makespan given by B-G is than the makespan given by EquiD for different levels of heterogeneity. Here, the time measurements of ResNet101 and CIFAR10 were used.
Figure 4: Makespan achieved by EquiD as the number of clients and helpers varies (for ResNet101 and MNIST).

Theorems & Definitions (9)

Theorem 1
proof
Theorem 2
Theorem 3
proof
Theorem 4
proof
Theorem 5
proof

Makespan Minimization in Split Learning: From Theory to Practice

TL;DR

Abstract

Makespan Minimization in Split Learning: From Theory to Practice

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (9)