Table of Contents
Fetching ...

Graph Theory Meets Federated Learning over Satellite Constellations: Spanning Aggregations, Network Formation, and Performance Optimization

Fardis Nadimi, Payam Abdisarabshali, Jacob Chakareski, Nicholas Mastronarde, Seyyedali Hosseinalipour

TL;DR

Fed-Span advances federated learning for satellite constellations by replacing a single central aggregator with spanning tree based topologies formed over inter-satellite laser links, enabling over-space aggregations that adapt to dynamic satellite networks. It develops continuous constraint representations to model MSTs and MoDSTs/MoDSFs, derives convergence bounds for non-convex loss under time-varying data and idle times, and casts the joint topology-and-resource optimization as a signomial program that is solved via a geometric programming approach with guarantees. The framework yields faster convergence and lower energy/latency in simulations across multiple datasets and constellations, while offering flexible VC-based clustering to balance ML performance and resource use. This work thus provides a practical, optimization-driven blueprint for energy- and latency-aware, ground-free federated learning in space.

Abstract

In this work, we introduce Fed-Span: \textit{\underline{fed}erated learning with \underline{span}ning aggregation over low Earth orbit (LEO) satellite constellations}. Fed-Span aims to address critical challenges inherent to distributed learning in dynamic satellite networks, including intermittent satellite connectivity, heterogeneous computational capabilities of satellites, and time-varying satellites' datasets. At its core, Fed-Span leverages minimum spanning tree (MST) and minimum spanning forest (MSF) topologies to introduce spanning model aggregation and dispatching processes for distributed learning. To formalize Fed-Span, we offer a fresh perspective on MST/MSF topologies by formulating them through a set of continuous constraint representations (CCRs), thereby integrating these topologies into a distributed learning framework for satellite networks. Using these CCRs, we obtain the energy consumption and latency of operations in Fed-Span. Moreover, we derive novel convergence bounds for Fed-Span, accommodating its key system characteristics and degrees of freedom (i.e., tunable parameters). Finally, we propose a comprehensive optimization problem that jointly minimizes model prediction loss, energy consumption, and latency of {Fed-Span}. We unveil that this problem is NP-hard and develop a systematic approach to transform it into a geometric programming formulation, solved via successive convex optimization with performance guarantees. Through evaluations on real-world datasets, we demonstrate that Fed-Span outperforms existing methods, with faster model convergence, greater energy efficiency, and reduced latency.

Graph Theory Meets Federated Learning over Satellite Constellations: Spanning Aggregations, Network Formation, and Performance Optimization

TL;DR

Fed-Span advances federated learning for satellite constellations by replacing a single central aggregator with spanning tree based topologies formed over inter-satellite laser links, enabling over-space aggregations that adapt to dynamic satellite networks. It develops continuous constraint representations to model MSTs and MoDSTs/MoDSFs, derives convergence bounds for non-convex loss under time-varying data and idle times, and casts the joint topology-and-resource optimization as a signomial program that is solved via a geometric programming approach with guarantees. The framework yields faster convergence and lower energy/latency in simulations across multiple datasets and constellations, while offering flexible VC-based clustering to balance ML performance and resource use. This work thus provides a practical, optimization-driven blueprint for energy- and latency-aware, ground-free federated learning in space.

Abstract

In this work, we introduce Fed-Span: \textit{\underline{fed}erated learning with \underline{span}ning aggregation over low Earth orbit (LEO) satellite constellations}. Fed-Span aims to address critical challenges inherent to distributed learning in dynamic satellite networks, including intermittent satellite connectivity, heterogeneous computational capabilities of satellites, and time-varying satellites' datasets. At its core, Fed-Span leverages minimum spanning tree (MST) and minimum spanning forest (MSF) topologies to introduce spanning model aggregation and dispatching processes for distributed learning. To formalize Fed-Span, we offer a fresh perspective on MST/MSF topologies by formulating them through a set of continuous constraint representations (CCRs), thereby integrating these topologies into a distributed learning framework for satellite networks. Using these CCRs, we obtain the energy consumption and latency of operations in Fed-Span. Moreover, we derive novel convergence bounds for Fed-Span, accommodating its key system characteristics and degrees of freedom (i.e., tunable parameters). Finally, we propose a comprehensive optimization problem that jointly minimizes model prediction loss, energy consumption, and latency of {Fed-Span}. We unveil that this problem is NP-hard and develop a systematic approach to transform it into a geometric programming formulation, solved via successive convex optimization with performance guarantees. Through evaluations on real-world datasets, we demonstrate that Fed-Span outperforms existing methods, with faster model convergence, greater energy efficiency, and reduced latency.

Paper Structure

This paper contains 56 sections, 6 theorems, 269 equations, 9 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

Using Assumptions Assup:IntraClusterDissimilarity and Assup:InterClusterDissimilarity while presuming the feasible values of $\zeta^{\mathsf{Loc}}_{c,2} = 0, \forall c \in \mathcal{C}^{(k)}$, and $\zeta^{\mathsf{Glob}}_1 = 1$, let $\zeta^{\mathsf{Loc},\min}_{c,1}$ denote the minimum value of $\zeta^

Figures (9)

  • Figure 1: (a) Star topology of FedL. (b) Satellites with different orbits (denoted by various colors) around the Earth, using inter- and intra-orbit links.
  • Figure 2: A schematic of the operations that take place in each global round of Fed-Span. (a) Phase 1: The global model is dispatched through a root-to-leaf continuum using a downward-directed tree structure. (b) Phase 2: Upon receiving the global model, satellites conduct local model training rounds. (c) Phase 3: Once each local training round is completed, a local model aggregation occurs, where satellites are organized into Virtual Clusters (VCs) and transfer their models to the respective VC root node via upward-directed trees, resembling a forest topology. (d) Phase 4: The aggregated models of VCs are disseminated back to their satellites using downward-directed trees. (e) Phase 5: After the execution of the last local training round of each global round, satellites' models are aggregated at the global root node through model transfers along a leaf-to-root continuum via an upward-directed tree. The newly aggregated global model is then dispatched across the satellites through Phase 1, which marks the start of the next global round.
  • Figure 3: Time instances and durations of different operations in Fed-Span.
  • Figure 4: The lower-bound obtained in Proposition \ref{['th:clus']} (i.e., the left hand side of the bound) vs the actual value of the right hand side (i.e., $\zeta^{\mathsf{Glob}}_2$).
  • Figure 5: Performance comparisons between our method and baselines in terms of (i) model test accuracy vs latency/time (top row) and (ii) energy usage required to reach various model test accuracy (bottom row) across three datasets (Fashion-MNIST, CIFAR-10, and FMoW) under various non-iid data configurations determined by Dirichlet parameter $\alpha^{\mathsf{Dri}}$. The same color-coding is used in all the plots (both line and bar plots) as described in the top legend. Lower $\alpha^{\mathsf{Dri}}$ means more heterogeneous data. In the bar plots, sign $\times$ on top of a bar implies that the respective algorithm could not reach the desired accuracy. Also, the y-axis of the bar plots are presented in the logarithmic format due to the large gap between our method and the rest of the baselines, caused by the use of fast optical links between the satellites, whose data rate can reach 30 Gbps. Results show that our method consistently achieves superior performance compared to the baselines, with the performance gap widening as the datasets become more complex (i.e., CIFAR-10 and FMoW) and the data distribution becomes more non-iid. This is because for challenging datasets, the choice of training strategy and resource allocations plays a more prominent role in model training accuracy.
  • ...and 4 more figures

Theorems & Definitions (17)

  • Remark 1: Multi-Granularity Representation of Network Operations
  • Remark 2: Interpretation of Continuous Constraint Representations (CCRs)
  • Remark 3: Natural Cycle Removal with Delay and Energy Consideration
  • Definition 1: Local Data Variability
  • Definition 2: Model Drift
  • Proposition 1: Bounding the Dissimilarity of Loss Functions across VCs
  • proof
  • Remark 4: Interpretation of Proposition \ref{['th:clus']}
  • Theorem 1: General Convergence Behavior of Fed-Span
  • proof
  • ...and 7 more