Table of Contents
Fetching ...

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

Yiannis Papageorgiou, Yannis Thomas, Ramin Khalili, Iordanis Koutsopoulos

TL;DR

This work proves that the problem is NP-hard and proposes the first accuracy-aware heuristic algorithm that explicitly accounts for model accuracy, while remaining delay-efficient, compared to state-of-the-art SFL and HSFL schemes.

Abstract

Can we find a network architecture for ML model training so as to optimize training loss (and thus, accuracy) in Split Federated Learning (SFL)? And can this architecture also reduce training delay and communication overhead? While accuracy is not influenced by how we split the model in ordinary, state-of-the-art SFL, in this work we answer the questions above in the affirmative. Recent Hierarchical SFL (HSFL) architectures adopt a three-tier training structure consisting of clients, (local) aggregators, and a central server. In this architecture, the model is partitioned at two partitioning layers into three sub-models, which are executed across the three tiers. Despite their merits, HSFL architectures overlook the impact of the partitioning layers and client-to-aggregator assignments on accuracy, delay, and overhead. This work explicitly captures the impact of the partitioning layers and client-to-aggregator assignments on accuracy, delay and overhead by formulating a joint optimization problem. We prove that the problem is NP-hard and propose the first accuracy-aware heuristic algorithm that explicitly accounts for model accuracy, while remaining delay-efficient. Simulation results on public datasets show that our approach can improve accuracy by 3%, while reducing delay by 20% and overhead by 50%, compared to state-of-the-art SFL and HSFL schemes.

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

TL;DR

This work proves that the problem is NP-hard and proposes the first accuracy-aware heuristic algorithm that explicitly accounts for model accuracy, while remaining delay-efficient, compared to state-of-the-art SFL and HSFL schemes.

Abstract

Can we find a network architecture for ML model training so as to optimize training loss (and thus, accuracy) in Split Federated Learning (SFL)? And can this architecture also reduce training delay and communication overhead? While accuracy is not influenced by how we split the model in ordinary, state-of-the-art SFL, in this work we answer the questions above in the affirmative. Recent Hierarchical SFL (HSFL) architectures adopt a three-tier training structure consisting of clients, (local) aggregators, and a central server. In this architecture, the model is partitioned at two partitioning layers into three sub-models, which are executed across the three tiers. Despite their merits, HSFL architectures overlook the impact of the partitioning layers and client-to-aggregator assignments on accuracy, delay, and overhead. This work explicitly captures the impact of the partitioning layers and client-to-aggregator assignments on accuracy, delay and overhead by formulating a joint optimization problem. We prove that the problem is NP-hard and propose the first accuracy-aware heuristic algorithm that explicitly accounts for model accuracy, while remaining delay-efficient. Simulation results on public datasets show that our approach can improve accuracy by 3%, while reducing delay by 20% and overhead by 50%, compared to state-of-the-art SFL and HSFL schemes.
Paper Structure (25 sections, 9 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 25 sections, 9 equations, 9 figures, 9 tables, 2 algorithms.

Figures (9)

  • Figure 1: Example network topology that presents a network architecture with three clients, where each client trains the model in collaboration with the server. Client 1, being computationally stronger, is selected as a local aggregator. The model is partitioned into three sub-models (a, b, and c) as defined by the aggregator layer and the cut layer. The clients, local aggregators and server train the first (a), middle (b) and last (c) sub-model respectively.
  • Figure 2: Test accuracy versus training epochs for different cut layers during the training of AlexNet and VGG-11 models across 100 clients with local-loss learning.
  • Figure 3: Batch processing with one server and 3 clients, where client 3 acts as local aggregator. The figure depicts the training tasks (rectangles) executed by each client in time (x-axis). The horizontal blue lines mark the execution zone of each node; tasks within this zone are executed by the corresponding node; multi-threaded execution is implied when multiple tasks in the same zone overlap in time. For each task, we present the trained sub-model, the training "direction" (FP or BP) and the trained layers, e.g., $W^w_3$$FP(1,h)$ encodes that FP of layers 1 to h in the weak-side sub-model of client 3. The arrows symbolize the network transmission of activations and gradients in the training chain.
  • Figure 4: Test accuracy versus (a) training delay and (b) communication overhead achieved by AA HSFL-ll and baseline schemes during the training of the AlexNet model on the MNIST dataset.
  • Figure 5: Test accuracy versus (a) training delay and (b) communication overhead achieved by AA HSFL-ll and baseline schemes during the training of the VGG-11 model on the CIFAR-10 dataset.
  • ...and 4 more figures