Table of Contents
Fetching ...

Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning

Chong Yu, Shuaiqi Shen, Shiqiang Wang, Kuan Zhang, Hai Zhao

TL;DR

This work tackles the challenge of learning from e-health data partitioned across horizontal and vertical axes by proposing a three-tier Hybrid Federated Learning framework that combines intermediate result exchange with local and global aggregations. The authors introduce Hybrid SGD (HSGD), provide convergence guarantees under standard smoothness and variance assumptions, and derive adaptive strategies to balance convergence quality with communication cost. Empirical results across multi-domain datasets demonstrate substantial reductions in training time and communication overhead while maintaining high accuracy, validating the effectiveness of the adaptive interval tuning and learning-rate adjustments. The framework offers a practical, privacy-preserving approach for collaborative medical AI among wearables, hospitals, and cloud infrastructure, with potential for secure integration and broader adoption in health IT.

Abstract

E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis. By allowing multiple devices to train models collaboratively, federated learning is a promising solution to address the communication and privacy issues in e-health. However, applying federated learning in e-health faces many challenges. First, medical data is both horizontally and vertically partitioned. Since single Horizontal Federated Learning (HFL) or Vertical Federated Learning (VFL) techniques cannot deal with both types of data partitioning, directly applying them may consume excessive communication cost due to transmitting a part of raw data when requiring high modeling accuracy. Second, a naive combination of HFL and VFL has limitations including low training efficiency, unsound convergence analysis, and lack of parameter tuning strategies. In this paper, we provide a thorough study on an effective integration of HFL and VFL, to achieve communication efficiency and overcome the above limitations when data is both horizontally and vertically partitioned. Specifically, we propose a hybrid federated learning framework with one intermediate result exchange and two aggregation phases. Based on this framework, we develop a Hybrid Stochastic Gradient Descent (HSGD) algorithm to train models. Then, we theoretically analyze the convergence upper bound of the proposed algorithm. Using the convergence results, we design adaptive strategies to adjust the training parameters and shrink the size of transmitted data. Experimental results validate that the proposed HSGD algorithm can achieve the desired accuracy while reducing communication cost, and they also verify the effectiveness of the adaptive strategies.

Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning

TL;DR

This work tackles the challenge of learning from e-health data partitioned across horizontal and vertical axes by proposing a three-tier Hybrid Federated Learning framework that combines intermediate result exchange with local and global aggregations. The authors introduce Hybrid SGD (HSGD), provide convergence guarantees under standard smoothness and variance assumptions, and derive adaptive strategies to balance convergence quality with communication cost. Empirical results across multi-domain datasets demonstrate substantial reductions in training time and communication overhead while maintaining high accuracy, validating the effectiveness of the adaptive interval tuning and learning-rate adjustments. The framework offers a practical, privacy-preserving approach for collaborative medical AI among wearables, hospitals, and cloud infrastructure, with potential for secure integration and broader adoption in health IT.

Abstract

E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis. By allowing multiple devices to train models collaboratively, federated learning is a promising solution to address the communication and privacy issues in e-health. However, applying federated learning in e-health faces many challenges. First, medical data is both horizontally and vertically partitioned. Since single Horizontal Federated Learning (HFL) or Vertical Federated Learning (VFL) techniques cannot deal with both types of data partitioning, directly applying them may consume excessive communication cost due to transmitting a part of raw data when requiring high modeling accuracy. Second, a naive combination of HFL and VFL has limitations including low training efficiency, unsound convergence analysis, and lack of parameter tuning strategies. In this paper, we provide a thorough study on an effective integration of HFL and VFL, to achieve communication efficiency and overcome the above limitations when data is both horizontally and vertically partitioned. Specifically, we propose a hybrid federated learning framework with one intermediate result exchange and two aggregation phases. Based on this framework, we develop a Hybrid Stochastic Gradient Descent (HSGD) algorithm to train models. Then, we theoretically analyze the convergence upper bound of the proposed algorithm. Using the convergence results, we design adaptive strategies to adjust the training parameters and shrink the size of transmitted data. Experimental results validate that the proposed HSGD algorithm can achieve the desired accuracy while reducing communication cost, and they also verify the effectiveness of the adaptive strategies.
Paper Structure (19 sections, 10 theorems, 53 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 10 theorems, 53 equations, 10 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumptions Assumption1--Assumption2, when the learning rate $\eta$ satisfies $\eta\leq \frac{1}{8P\rho}$, the expected averaged squared gradient of $F$ over $T=RP$ is upper bounded by

Figures (10)

  • Figure 1: E-health architecture: e-health has a three-tier horizontal-vertical-horizontal data distribution structure.
  • Figure 2: Hybrid federated learning framework.
  • Figure 3: HSGD algorithm implementation: the local model of hospital-patient group $m$ is ${{\bm\theta_m}:=[({\bm\theta}_{0,m})^{\mathrm{T}},({\bm\theta}_{1,m})^{\mathrm{T}},({\bm\theta}_{2,m})^{\mathrm{T}}]^{\mathrm{T}}}$, where ${\bm\theta}_{0,m}$ and ${\bm\theta}_{1,m}$ are trained on the hospital. Edge nodes not only aggregate models ${\bm\theta}_{2,m,n}, \forall n\in \mathcal{A}_m$ trained on wearable devices but also forward intermediate results for hospitals and wearable devices. The global model is generated on the cloud server by aggregating local models.
  • Figure 4: Training performance versus time on three datasets. (a)-(c): Our proposed algorithm consumes less time to achieve target training requirements compared to baselines. (c): When the size of raw data is large, the proposed algorithm begins model training earlier contributing to saving training time.
  • Figure 5: Training performance versus communication cost (each group) on three datasets. The raw data sizes of OrganAMNIST, ESR, and MIMIC-III are 63 MB, 7.3 MB, and 42.3 GB, respectively. Our proposed algorithm can reduce the communication cost, and its advantage becomes obvious when the size of raw data becomes larger.
  • ...and 5 more figures

Theorems & Definitions (20)

  • Theorem 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 10 more