Scheduling for On-Board Federated Learning with Satellite Clusters
Nasrin Razmi, Bho Matthiesen, Armin Dekorsy, Petar Popovski
TL;DR
This work tackles federated learning over satellite constellations with intermittent ground-station connectivity. It introduces a two-tier scheduling framework—Global Update (GU) at the GS and Cluster Update (CU) within each orbit cluster—that exploits both per-satellite visibility and the cumulative orbit-level visibility to time global updates and local training. GU selects the earliest feasible update times $t_n$ using per-cluster demand and finish times, while CU adapts the number of local epochs $I_{n,p}$ according to the available time in each time slot, coordinating parameter exchange via intra-orbit ISL. Experiments on a Walker Delta constellation with CIFAR-10 show the scheduler improves test accuracy and reduces wall-clock time compared to non-scheduled baselines, highlighting the practical impact for robust on-board FL in space missions.
Abstract
Mega-constellations of small satellites have evolved into a source of massive amount of valuable data. To manage this data efficiently, on-board federated learning (FL) enables satellites to train a machine learning (ML) model collaboratively without having to share the raw data. This paper introduces a scheme for scheduling on-board FL for constellations connected with intra-orbit inter-satellite links. The proposed scheme utilizes the predictable visibility pattern between satellites and ground station (GS), both at the individual satellite level and cumulatively within the entire orbit, to mitigate intermittent connectivity and best use of available time. To this end, two distinct schedulers are employed: one for coordinating the FL procedures among orbits, and the other for controlling those within each orbit. These two schedulers cooperatively determine the appropriate time to perform global updates in GS and then allocate suitable duration to satellites within each orbit for local training, proportional to usable time until next global update. This scheme leads to improved test accuracy within a shorter time.
