Table of Contents
Fetching ...

SatFed: A Resource-Efficient LEO Satellite-Assisted Heterogeneous Federated Learning Framework

Yuxin Zhang, Zheng Lin, Zhe Chen, Zihan Fang, Wenjun Zhu, Xianhao Chen, Jin Zhao, Yue Gao

TL;DR

This work proposes SatFed, a resource-efficient satellite-assisted heterogeneous FL framework that implements freshness-based model prioritization queues to optimize the use of highly constrained satellite-ground bandwidth, ensuring the transmission of the most critical models.

Abstract

Traditional federated learning (FL) frameworks rely heavily on terrestrial networks, where coverage limitations and increasing bandwidth congestion significantly hinder model convergence. Fortunately, the advancement of low-Earth orbit (LEO) satellite networks offers promising new communication avenues to augment traditional terrestrial FL. Despite this potential, the limited satellite-ground communication bandwidth and the heterogeneous operating environments of ground devices-including variations in data, bandwidth, and computing power-pose substantial challenges for effective and robust satellite-assisted FL. To address these challenges, we propose SatFed, a resource-efficient satellite-assisted heterogeneous FL framework. SatFed implements freshness-based model prioritization queues to optimize the use of highly constrained satellite-ground bandwidth, ensuring the transmission of the most critical models. Additionally, a multigraph is constructed to capture real-time heterogeneous relationships between devices, including data distribution, terrestrial bandwidth, and computing capability. This multigraph enables SatFed to aggregate satellite-transmitted models into peer guidance, enhancing local training in heterogeneous environments. Extensive experiments with real-world LEO satellite networks demonstrate that SatFed achieves superior performance and robustness compared to state-of-the-art benchmarks.

SatFed: A Resource-Efficient LEO Satellite-Assisted Heterogeneous Federated Learning Framework

TL;DR

This work proposes SatFed, a resource-efficient satellite-assisted heterogeneous FL framework that implements freshness-based model prioritization queues to optimize the use of highly constrained satellite-ground bandwidth, ensuring the transmission of the most critical models.

Abstract

Traditional federated learning (FL) frameworks rely heavily on terrestrial networks, where coverage limitations and increasing bandwidth congestion significantly hinder model convergence. Fortunately, the advancement of low-Earth orbit (LEO) satellite networks offers promising new communication avenues to augment traditional terrestrial FL. Despite this potential, the limited satellite-ground communication bandwidth and the heterogeneous operating environments of ground devices-including variations in data, bandwidth, and computing power-pose substantial challenges for effective and robust satellite-assisted FL. To address these challenges, we propose SatFed, a resource-efficient satellite-assisted heterogeneous FL framework. SatFed implements freshness-based model prioritization queues to optimize the use of highly constrained satellite-ground bandwidth, ensuring the transmission of the most critical models. Additionally, a multigraph is constructed to capture real-time heterogeneous relationships between devices, including data distribution, terrestrial bandwidth, and computing capability. This multigraph enables SatFed to aggregate satellite-transmitted models into peer guidance, enhancing local training in heterogeneous environments. Extensive experiments with real-world LEO satellite networks demonstrate that SatFed achieves superior performance and robustness compared to state-of-the-art benchmarks.
Paper Structure (31 sections, 11 equations, 12 figures, 1 table, 1 algorithm)

This paper contains 31 sections, 11 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Paradigm and challenges in satellite-assisted FL.
  • Figure 2: The uplink in the LEO satellite network represents a significant transmission bottleneck. Fig. \ref{['subfig:motivating_setup']} illustrates our experimental setup, while Fig. \ref{['subfig:motivating_cdf']} presents the resulting CDF.
  • Figure 3: The impact of data heterogeneity. The CIFAR-100 dataset CIFAR10 is distributed among 10 edge devices, with local labels following a Dirichlet distribution (a smaller value of $\alpha$ indicates greater heterogeneity) chen2022towards. The experimental parameters are consistent with Sec. \ref{['sec:eval']}
  • Figure 4: The impact of bandwidth stragglers. Asynchronously aggregate local updates to form and evaluate the global model. Stragglers have a $90\%$ chance of communication blockage or disconnection with the server, retrying every $30$ minutes.
  • Figure 5: The impact of uneven training. The local update period is set to $30$ minutes, with normal devices having a compute power of $4$ GFLOPS (performance of a Raspberry Pi 4 Model B basford2020performance), while undertrained devices have only $1/5$ of that. The model is ResNet-50 with a batch size of $128$. Scale balance (scale anti-balance) increases the local learning rates of undertrained (normal) devices by $1.5\times$.
  • ...and 7 more figures