Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

Yiannis Papageorgiou; Yannis Thomas; Ramin Khalili; Iordanis Koutsopoulos

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

Yiannis Papageorgiou, Yannis Thomas, Ramin Khalili, Iordanis Koutsopoulos

TL;DR

This work proves that the problem is NP-hard and proposes the first accuracy-aware heuristic algorithm that explicitly accounts for model accuracy, while remaining delay-efficient, compared to state-of-the-art SFL and HSFL schemes.

Abstract

Can we find a network architecture for ML model training so as to optimize training loss (and thus, accuracy) in Split Federated Learning (SFL)? And can this architecture also reduce training delay and communication overhead? While accuracy is not influenced by how we split the model in ordinary, state-of-the-art SFL, in this work we answer the questions above in the affirmative. Recent Hierarchical SFL (HSFL) architectures adopt a three-tier training structure consisting of clients, (local) aggregators, and a central server. In this architecture, the model is partitioned at two partitioning layers into three sub-models, which are executed across the three tiers. Despite their merits, HSFL architectures overlook the impact of the partitioning layers and client-to-aggregator assignments on accuracy, delay, and overhead. This work explicitly captures the impact of the partitioning layers and client-to-aggregator assignments on accuracy, delay and overhead by formulating a joint optimization problem. We prove that the problem is NP-hard and propose the first accuracy-aware heuristic algorithm that explicitly accounts for model accuracy, while remaining delay-efficient. Simulation results on public datasets show that our approach can improve accuracy by 3%, while reducing delay by 20% and overhead by 50%, compared to state-of-the-art SFL and HSFL schemes.

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

TL;DR

Abstract

Paper Structure (25 sections, 9 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 25 sections, 9 equations, 9 figures, 9 tables, 2 algorithms.

Introduction
Related Work
Related work
Accuracy-Aware Hierarchical Split Federated Learning with Local Loss
System design
Network
Model partitioning
Training
Batch processing
Local & Global aggregation
Training delay analysis
Problem Formulation -- Joint selection of partitioning layers & client-to-aggregator assignments
Problem's NP-hardness
Accuracy aware model partitioning and aggregator assignment algorithm
Algorithm 1: Identification of candidate cut layers
...and 10 more sections

Figures (9)

Figure 1: Example network topology that presents a network architecture with three clients, where each client trains the model in collaboration with the server. Client 1, being computationally stronger, is selected as a local aggregator. The model is partitioned into three sub-models (a, b, and c) as defined by the aggregator layer and the cut layer. The clients, local aggregators and server train the first (a), middle (b) and last (c) sub-model respectively.
Figure 2: Test accuracy versus training epochs for different cut layers during the training of AlexNet and VGG-11 models across 100 clients with local-loss learning.
Figure 3: Batch processing with one server and 3 clients, where client 3 acts as local aggregator. The figure depicts the training tasks (rectangles) executed by each client in time (x-axis). The horizontal blue lines mark the execution zone of each node; tasks within this zone are executed by the corresponding node; multi-threaded execution is implied when multiple tasks in the same zone overlap in time. For each task, we present the trained sub-model, the training "direction" (FP or BP) and the trained layers, e.g., $W^w_3$$FP(1,h)$ encodes that FP of layers 1 to h in the weak-side sub-model of client 3. The arrows symbolize the network transmission of activations and gradients in the training chain.
Figure 4: Test accuracy versus (a) training delay and (b) communication overhead achieved by AA HSFL-ll and baseline schemes during the training of the AlexNet model on the MNIST dataset.
Figure 5: Test accuracy versus (a) training delay and (b) communication overhead achieved by AA HSFL-ll and baseline schemes during the training of the VGG-11 model on the CIFAR-10 dataset.
...and 4 more figures

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

TL;DR

Abstract

Split Federated Learning Architectures for High-Accuracy and Low-Delay Model Training

Authors

TL;DR

Abstract

Table of Contents

Figures (9)