Table of Contents
Fetching ...

Personalized Hierarchical Split Federated Learning in Wireless Networks

Md-Ferdous Pervej, Andreas F. Molisch

TL;DR

The paper tackles personalization in resource-limited wireless networks by proposing Personal ized Split Hierarchical Federated Learning (PHSFL), which trains only the body of a split model on clients while freezing the classifier during global rounds and then personalizes via head fine-tuning. By integrating split learning with hierarchical federated learning, PHSFL reduces communication and computation while accommodating multi-tier network heterogeneity. A theoretical convergence bound for the average global gradient norm is derived under standard assumptions, and simulations on CIFAR-10 show that global generalization is comparable to HSFL, while per-client fine-tuning yields notable personalization gains, especially under non-IID data. These findings demonstrate a practical path to scalable, personalized ML in wireless edge networks with limited device resources.

Abstract

Extreme resource constraints make large-scale machine learning (ML) with distributed clients challenging in wireless networks. On the one hand, large-scale ML requires massive information exchange between clients and server(s). On the other hand, these clients have limited battery and computation powers that are often dedicated to operational computations. Split federated learning (SFL) is emerging as a potential solution to mitigate these challenges, by splitting the ML model into client-side and server-side model blocks, where only the client-side block is trained on the client device. However, practical applications require personalized models that are suitable for the client's personal task. Motivated by this, we propose a personalized hierarchical split federated learning (PHSFL) algorithm that is specially designed to achieve better personalization performance. More specially, owing to the fact that regardless of the severity of the statistical data distributions across the clients, many of the features have similar attributes, we only train the body part of the federated learning (FL) model while keeping the (randomly initialized) classifier frozen during the training phase. We first perform extensive theoretical analysis to understand the impact of model splitting and hierarchical model aggregations on the global model. Once the global model is trained, we fine-tune each client classifier to obtain the personalized models. Our empirical findings suggest that while the globally trained model with the untrained classifier performs quite similarly to other existing solutions, the fine-tuned models show significantly improved personalized performance.

Personalized Hierarchical Split Federated Learning in Wireless Networks

TL;DR

The paper tackles personalization in resource-limited wireless networks by proposing Personal ized Split Hierarchical Federated Learning (PHSFL), which trains only the body of a split model on clients while freezing the classifier during global rounds and then personalizes via head fine-tuning. By integrating split learning with hierarchical federated learning, PHSFL reduces communication and computation while accommodating multi-tier network heterogeneity. A theoretical convergence bound for the average global gradient norm is derived under standard assumptions, and simulations on CIFAR-10 show that global generalization is comparable to HSFL, while per-client fine-tuning yields notable personalization gains, especially under non-IID data. These findings demonstrate a practical path to scalable, personalized ML in wireless edge networks with limited device resources.

Abstract

Extreme resource constraints make large-scale machine learning (ML) with distributed clients challenging in wireless networks. On the one hand, large-scale ML requires massive information exchange between clients and server(s). On the other hand, these clients have limited battery and computation powers that are often dedicated to operational computations. Split federated learning (SFL) is emerging as a potential solution to mitigate these challenges, by splitting the ML model into client-side and server-side model blocks, where only the client-side block is trained on the client device. However, practical applications require personalized models that are suitable for the client's personal task. Motivated by this, we propose a personalized hierarchical split federated learning (PHSFL) algorithm that is specially designed to achieve better personalization performance. More specially, owing to the fact that regardless of the severity of the statistical data distributions across the clients, many of the features have similar attributes, we only train the body part of the federated learning (FL) model while keeping the (randomly initialized) classifier frozen during the training phase. We first perform extensive theoretical analysis to understand the impact of model splitting and hierarchical model aggregations on the global model. Once the global model is trained, we fine-tune each client classifier to obtain the personalized models. Our empirical findings suggest that while the globally trained model with the untrained classifier performs quite similarly to other existing solutions, the fine-tuned models show significantly improved personalized performance.

Paper Structure

This paper contains 25 sections, 4 theorems, 47 equations, 4 figures.

Key Result

Theorem 1

Suppose the above assumptions hold. Then, if the learning rate satisfies $\eta < \frac{1}{2\sqrt{5} \beta \kappa_1 \kappa_0}$, the average global gradient norm is upper bounded by where $\Gamma_0 \coloneqq 4\beta^2 \eta^2 \kappa_0^2 - 4 \beta^2 \eta^2 \kappa_0^2 \sum_{b=0}^{B-1} \alpha_b \sum_{u \in \mathcal{U}_b} \alpha_{u}^2$, $\Gamma_1 \coloneqq 80 \kappa_1^2 \beta^4 \eta^4 \kappa_0^4 + 4 \k

Figures (4)

  • Figure 1: Globally trained model's performance on CIFAR$10$: $65.36\%$, $83.93\%$ and $33.33\%$ mean, maximum and minimum test accuracy, respectively, across $100$ clients, when data samples are distributed following $\mathrm{Dir}(\pmb{\alpha}=\mathbf{0.1})$pervej2023resource
  • Figure 2: Test performance comparisons on CIFAR$10$ across $U=100$ users, when $\mathrm{Dir}(\pmb{\alpha}=\mathbf{0.5)}$
  • Figure 3: Test performance comparisons on CIFAR$10$ across $U=100$ users, when $\mathrm{Dir}(\pmb{\alpha}=\mathbf{0.1)}$: $\mathbf{w}^{*}$ and $\mathbf{w}_u^{K=10}$ represent the global trained model and fine-tuned personalized model, respectively
  • Figure 4: Test performance comparisons on CIFAR$10$ across $U=100$ users, when $\mathrm{Dir}(\pmb{\alpha}=\mathbf{0.5)}$: $\mathbf{w}^{*}$ and $\mathbf{w}_u^{K=10}$ represent the global trained model and fine-tuned personalized model, respectively

Theorems & Definitions (11)

  • Remark 1: Communication overheads
  • Remark 2: Choice of the cut layer
  • Theorem 1
  • proof
  • Remark 3
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 1 more