Table of Contents
Fetching ...

Workflow Optimization for Parallel Split Learning

Joana Tirana, Dimitra Tsigkari, George Iosifidis, Dimitris Chatzopoulos

TL;DR

This paper addresses the challenge of minimizing the makespan in parallel Split Learning by jointly optimizing client-to-helper assignments and the forward/backward propagation schedule in heterogeneous networks. It introduces a two-subproblem decomposition: forward-propagation optimization solved via an ADMM-based method and backward-propagation scheduling solved in polynomial time given the forward plan, complemented by a scalable balanced-greedy heuristic. The authors prove NP-hardness of the joint problem and demonstrate near-optimal performance with substantial speedups over exact ILP solvers in testbed-driven experiments using CIFAR-10 with ResNet101 and VGG19 on various hardware. The work has practical implications for accelerating distributed training on resource-constrained devices, with guidance on selecting the appropriate method based on system size and heterogeneity. Future directions include optimizing neural network cut layers per client and incorporating energy-aware considerations.

Abstract

Split learning (SL) has been recently proposed as a way to enable resource-constrained devices to train multi-parameter neural networks (NNs) and participate in federated learning (FL). In a nutshell, SL splits the NN model into parts, and allows clients (devices) to offload the largest part as a processing task to a computationally powerful helper. In parallel SL, multiple helpers can process model parts of one or more clients, thus, considerably reducing the maximum training time over all clients (makespan). In this paper, we focus on orchestrating the workflow of this operation, which is critical in highly heterogeneous systems, as our experiments show. In particular, we formulate the joint problem of client-helper assignments and scheduling decisions with the goal of minimizing the training makespan, and we prove that it is NP-hard. We propose a solution method based on the decomposition of the problem by leveraging its inherent symmetry, and a second one that is fully scalable. A wealth of numerical evaluations using our testbed's measurements allow us to build a solution strategy comprising these methods. Moreover, we show that this strategy finds a near-optimal solution, and achieves a shorter makespan than the baseline scheme by up to 52.3%.

Workflow Optimization for Parallel Split Learning

TL;DR

This paper addresses the challenge of minimizing the makespan in parallel Split Learning by jointly optimizing client-to-helper assignments and the forward/backward propagation schedule in heterogeneous networks. It introduces a two-subproblem decomposition: forward-propagation optimization solved via an ADMM-based method and backward-propagation scheduling solved in polynomial time given the forward plan, complemented by a scalable balanced-greedy heuristic. The authors prove NP-hardness of the joint problem and demonstrate near-optimal performance with substantial speedups over exact ILP solvers in testbed-driven experiments using CIFAR-10 with ResNet101 and VGG19 on various hardware. The work has practical implications for accelerating distributed training on resource-constrained devices, with guidance on selecting the appropriate method based on system size and heterogeneity. Future directions include optimizing neural network cut layers per client and incorporating energy-aware considerations.

Abstract

Split learning (SL) has been recently proposed as a way to enable resource-constrained devices to train multi-parameter neural networks (NNs) and participate in federated learning (FL). In a nutshell, SL splits the NN model into parts, and allows clients (devices) to offload the largest part as a processing task to a computationally powerful helper. In parallel SL, multiple helpers can process model parts of one or more clients, thus, considerably reducing the maximum training time over all clients (makespan). In this paper, we focus on orchestrating the workflow of this operation, which is critical in highly heterogeneous systems, as our experiments show. In particular, we formulate the joint problem of client-helper assignments and scheduling decisions with the goal of minimizing the training makespan, and we prove that it is NP-hard. We propose a solution method based on the decomposition of the problem by leveraging its inherent symmetry, and a second one that is fully scalable. A wealth of numerical evaluations using our testbed's measurements allow us to build a solution strategy comprising these methods. Moreover, we show that this strategy finds a near-optimal solution, and achieves a shorter makespan than the baseline scheme by up to 52.3%.
Paper Structure (10 sections, 2 theorems, 16 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 2 theorems, 16 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

$\mathbb{P}$ (Problem 1) is NP-hard.

Figures (8)

  • Figure 1: Parallel SL in this work. The considered network topology, its resources, and the processing tasks per entity.
  • Figure 2: The workflow of the batch processing for a single client and helper pair, and the corresponding times (processing and transmission). The queuing delay that a client might experience at the helper is not depicted here.
  • Figure 3: The roadmap to our ADMM-based solution method.
  • Figure 4: Algorithm 2 for optimal bwd-prop schedule in a toy example of 5 clients and 1 helper.
  • Figure 5: Profiled computing time (ms.) of part-1 for each device.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • proof