Table of Contents
Fetching ...

Hierarchical Split Federated Learning: Convergence Analysis and System Optimization

Zheng Lin, Wei Wei, Zhe Chen, Chan-Tong Lam, Xianhao Chen, Yue Gao, Jun Luo

TL;DR

HSFL extends split federated learning to multi-tier cloud-edge systems by partitioning a neural network across tiers and optimizing where to split (MS) and how often to aggregate (MA). The authors derive a convergence bound that accounts for tiered aggregation frequencies and layer cuts, then formulate and solve a joint MS/MA latency-minimization problem using a BCD-based approach with approximations and MILFP/Dinkelbach steps. The proposed framework, validated on CIFAR-10/MNIST with VGG-16, shows faster convergence and higher accuracy than several baselines, particularly under non-IID data and constrained network resources, while maintaining robustness to resource variations. These results indicate HSFL can significantly accelerate on-device training in large-scale, heterogeneous edge environments, enabling practical deployment of large models at the edge.

Abstract

As AI models expand in size, it has become increasingly challenging to deploy federated learning (FL) on resource-constrained edge devices. To tackle this issue, split federated learning (SFL) has emerged as an FL framework with reduced workload on edge devices via model splitting; it has received extensive attention from the research community in recent years. Nevertheless, most prior works on SFL focus only on a two-tier architecture without harnessing multi-tier cloudedge computing resources. In this paper, we intend to analyze and optimize the learning performance of SFL under multi-tier systems. Specifically, we propose the hierarchical SFL (HSFL) framework and derive its convergence bound. Based on the theoretical results, we formulate a joint optimization problem for model splitting (MS) and model aggregation (MA). To solve this rather hard problem, we then decompose it into MS and MA subproblems that can be solved via an iterative descending algorithm. Simulation results demonstrate that the tailored algorithm can effectively optimize MS and MA for SFL within virtually any multi-tier system.

Hierarchical Split Federated Learning: Convergence Analysis and System Optimization

TL;DR

HSFL extends split federated learning to multi-tier cloud-edge systems by partitioning a neural network across tiers and optimizing where to split (MS) and how often to aggregate (MA). The authors derive a convergence bound that accounts for tiered aggregation frequencies and layer cuts, then formulate and solve a joint MS/MA latency-minimization problem using a BCD-based approach with approximations and MILFP/Dinkelbach steps. The proposed framework, validated on CIFAR-10/MNIST with VGG-16, shows faster convergence and higher accuracy than several baselines, particularly under non-IID data and constrained network resources, while maintaining robustness to resource variations. These results indicate HSFL can significantly accelerate on-device training in large-scale, heterogeneous edge environments, enabling practical deployment of large models at the edge.

Abstract

As AI models expand in size, it has become increasingly challenging to deploy federated learning (FL) on resource-constrained edge devices. To tackle this issue, split federated learning (SFL) has emerged as an FL framework with reduced workload on edge devices via model splitting; it has received extensive attention from the research community in recent years. Nevertheless, most prior works on SFL focus only on a two-tier architecture without harnessing multi-tier cloudedge computing resources. In this paper, we intend to analyze and optimize the learning performance of SFL under multi-tier systems. Specifically, we propose the hierarchical SFL (HSFL) framework and derive its convergence bound. Based on the theoretical results, we formulate a joint optimization problem for model splitting (MS) and model aggregation (MA). To solve this rather hard problem, we then decompose it into MS and MA subproblems that can be solved via an iterative descending algorithm. Simulation results demonstrate that the tailored algorithm can effectively optimize MS and MA for SFL within virtually any multi-tier system.

Paper Structure

This paper contains 16 sections, 4 theorems, 45 equations, 9 figures, 2 algorithms.

Key Result

Lemma 1

Under Assumption asp:1, Algorithm HSFL_procedure ensures where ${\mathbbm{1}}_{\{\cdot\}}$ denotes the indicator function, $I_m$ represents model aggregation interval of $m$-th tier and ${\mathbf{\overline w}}_m^{t}$ is defined in Eqn. h_c_define.

Figures (9)

  • Figure 1: The illustration of HSFL over multi-tier computing systems, where a($m$) and c($m$) denote $m$-th tier sub-model FP and BP, b($m$) and d($m$) are activations and activations' gradients transmissions between $m$-th tier and ($m$+1)-th tier, e($m$), f($m$) and g($m$) represent $m$-th tier sub-model uplink uploading, aggregation, and downlink transmissions, respectively.
  • Figure 2: The comparison of two-tier and three-tier client-edge-cloud SFL and the impact of sub-model MA and MS on training performance and overhead. Fig. \ref{['sfig:three_two_tier_compare']} compares the performance of two- and three-tier SFL for test accuracy versus training time. Fig. \ref{['sfig:moti_1_accuracy_communication_overhead']} demonstrates the performance for test accuracy versus communication overhead with the MA intervals for the client-side sub-model ($I_1$) and edge-side sub-model $I_2$, given cutting layers $L_1=3$ and $L_2=8$. Fig. \ref{['sfig:model_split_compu_commu']} presents the per-round end-to-end latency versus cut layers, revealing the complex and significant impact of the cutting layer on communication-computing latency. Fig. \ref{['sfig:accuracy_epoch']} illustrates the performance for test accuracy versus epochs under different cut layers, given MA intervals $I_1=140$ and $I_2=20$, showing that MS has a non-trivial impact on model convergence. The experiment is conducted on the CIFAR-10 dataset under the non-IID setting. The transmission rates between edge devices and a cloud server is set to 15 Mbps li2014towards. The other experimental parameters are consistent with Sec. \ref{['simu_results']}.
  • Figure 3: An illustration of split training and sub-model aggregation stages.
  • Figure 4: The training performance on CIFAR-10 and MNIST datasets under IID and non-IID settings using VGG-16.
  • Figure 5: The converged test accuracy and time on CIFAR-10 and MNIST datasets under IID and non-IID settings using VGG-16.
  • ...and 4 more figures

Theorems & Definitions (7)

  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Corollary 1
  • Proposition 1
  • proof