Table of Contents
Fetching ...

FedMeld: A Model-dispersal Federated Learning Framework for Space-ground Integrated Networks

Qian Chen, Xianhao Chen, Kaibin Huang

TL;DR

FedMeld introduces an infrastructure-free federated learning framework for space-ground integrated networks by leveraging the store-ccarry-forward mobility of satellites to disperse and mix model parameters across adjacent regions. The authors derive convergence bounds for both full and partial participation, and formulate a joint SC-MR optimization to minimize training loss under latency constraints, delivering closed-form solutions for the inter-region round interval and semi-closed-form mixing ratio. Through extensive simulations on CIFAR-10 and MNIST with Starlink-like constellations, FedMeld achieves higher accuracy with lower communication costs compared to centralized and ISL-based baselines, while maintaining robustness to data heterogeneity. The work highlights a practical path to global FL in satellite-enabled networks, balancing latency, bandwidth, and heterogeneity considerations, and opens avenues for region-specific timing and adaptive mixing strategies.

Abstract

To bridge the digital divide, space-ground integrated networks (SGINs) are expected to deliver artificial intelligence (AI) services to every corner of the world. One key mission of SGINs is to support federated learning (FL) at a global scale. However, existing space-ground integrated FL frameworks involve ground stations or costly inter-satellite links, entailing excessive training latency and communication costs. To overcome these limitations, we propose an infrastructure-free federated learning framework based on a model dispersal (FedMeld) strategy, which exploits periodic movement patterns and store-carry-forward capabilities of satellites to enable parameter mixing across large-scale geographical regions. We theoretically show that FedMeld leads to global model convergence and quantify the effects of round interval and mixing ratio between adjacent areas on its learning performance. Based on the theoretical results, we formulate a joint optimization problem to design the staleness control and mixing ratio (SC-MR) for minimizing the training loss. By decomposing the problem into sequential SC and MR subproblems without compromising the optimality, we derive the round interval solution in a closed form and the mixing ratio in a semi-closed form to achieve the optimal latency-accuracy tradeoff. Experiments using various datasets demonstrate that FedMeld achieves superior model accuracy while significantly reducing communication costs as compared with traditional FL schemes for SGINs.

FedMeld: A Model-dispersal Federated Learning Framework for Space-ground Integrated Networks

TL;DR

FedMeld introduces an infrastructure-free federated learning framework for space-ground integrated networks by leveraging the store-ccarry-forward mobility of satellites to disperse and mix model parameters across adjacent regions. The authors derive convergence bounds for both full and partial participation, and formulate a joint SC-MR optimization to minimize training loss under latency constraints, delivering closed-form solutions for the inter-region round interval and semi-closed-form mixing ratio. Through extensive simulations on CIFAR-10 and MNIST with Starlink-like constellations, FedMeld achieves higher accuracy with lower communication costs compared to centralized and ISL-based baselines, while maintaining robustness to data heterogeneity. The work highlights a practical path to global FL in satellite-enabled networks, balancing latency, bandwidth, and heterogeneity considerations, and opens avenues for region-specific timing and adaptive mixing strategies.

Abstract

To bridge the digital divide, space-ground integrated networks (SGINs) are expected to deliver artificial intelligence (AI) services to every corner of the world. One key mission of SGINs is to support federated learning (FL) at a global scale. However, existing space-ground integrated FL frameworks involve ground stations or costly inter-satellite links, entailing excessive training latency and communication costs. To overcome these limitations, we propose an infrastructure-free federated learning framework based on a model dispersal (FedMeld) strategy, which exploits periodic movement patterns and store-carry-forward capabilities of satellites to enable parameter mixing across large-scale geographical regions. We theoretically show that FedMeld leads to global model convergence and quantify the effects of round interval and mixing ratio between adjacent areas on its learning performance. Based on the theoretical results, we formulate a joint optimization problem to design the staleness control and mixing ratio (SC-MR) for minimizing the training loss. By decomposing the problem into sequential SC and MR subproblems without compromising the optimality, we derive the round interval solution in a closed form and the mixing ratio in a semi-closed form to achieve the optimal latency-accuracy tradeoff. Experiments using various datasets demonstrate that FedMeld achieves superior model accuracy while significantly reducing communication costs as compared with traditional FL schemes for SGINs.

Paper Structure

This paper contains 29 sections, 7 theorems, 52 equations, 10 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

(Results of one step SGD Li2020n) Let Assumption ass:smooth and ass:convex hold. Notice that ${{\overline {\mathbf{v}}}_{t + 1}} = {{\overline {\mathbf{w}}}_t} - \eta_t {\mathbf{g}_t}$ always holds. If $\eta_t \leq \frac{1}{4L}$, we have where ${\mathbf{g}_t} = \frac{1}{M}\sum\limits_{i \in {\mathcal{M}}} {\frac{1}{{{N_i}}}\sum\limits_{j \in {\mathcal{N}_i}} {\nabla {F_j}\left( {\mathbf{w}_{t,j},

Figures (10)

  • Figure 1: Illustration of parameter mixing across different regions in the proposed FedMeld framework.
  • Figure 2: A diagram of parameter mixing in FedMeld algorithm. Colored satellites represent SCF satellites that carry aggregated models across adjacent regions and perform inter-region mixing upon arrival. Gray satellites denote non-SCF satellites, which only perform local aggregation (FedAvg) during their service period without model transfer to subsequent regions.
  • Figure 3: Test accuracy versus time on CIFAR-10 with IID clients (left), IID clusters with non-IID clients (center), and non-IID clusters with non-IID clients (right).
  • Figure 4: Test accuracy versus time on MNIST in client IID (left), cluster IID with client non-IID (center), and cluster non-IID with client non-IID (right).
  • Figure 5: Effect of number of satellites participating in training per orbit on test accuracy and training time on CIFAR-10.
  • ...and 5 more figures

Theorems & Definitions (19)

  • Remark 1
  • Definition 1: Degree of non-IID
  • Lemma 1
  • Theorem 1: Convergence bound under full participation scheme
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • proof
  • Lemma 3
  • ...and 9 more