Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study

Mohamad A. Hady, Siyi Hu, Mahardhika Pratama, Jimmy Cao, Ryszard Kowalczyk

TL;DR

This paper addresses autonomous coordination for Earth Observation missions by framing multi-satellite planning as a Dec-POMDP with resource constraints and partial observability. It evaluates PPO-based single-satellite RL and three CTDE MARL algorithms (MAPPO, HAPPO, IPPO) in a near-realistic Basilisk/BSK-RL simulator across Walker-delta and Cluster orbits, with a 2,000-target objective. The results show centralised PPO struggles due to non-stationarity, while MAPPO and HAPPO achieve stronger coordination and resilience to uncertainty, with IPPO providing a competitive decentralised baseline; data storage constraints have a pronounced impact on performance. The work provides practical guidelines for learning coordinated policies in decentralised EO missions and sets a pathway for handling heterogeneity and larger-scale constellations in future deployments.

Abstract

The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.

Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study

TL;DR

This paper addresses autonomous coordination for Earth Observation missions by framing multi-satellite planning as a Dec-POMDP with resource constraints and partial observability. It evaluates PPO-based single-satellite RL and three CTDE MARL algorithms (MAPPO, HAPPO, IPPO) in a near-realistic Basilisk/BSK-RL simulator across Walker-delta and Cluster orbits, with a 2,000-target objective. The results show centralised PPO struggles due to non-stationarity, while MAPPO and HAPPO achieve stronger coordination and resilience to uncertainty, with IPPO providing a competitive decentralised baseline; data storage constraints have a pronounced impact on performance. The work provides practical guidelines for learning coordinated policies in decentralised EO missions and sets a pathway for handling heterogeneity and larger-scale constellations in future deployments.

Abstract

The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.

Paper Structure

This paper contains 12 sections, 5 equations, 3 figures.

Figures (3)

  • Figure 1: Multi-Satellite Cluster Image Capturing Task Scenario: Sat-1 to Sat-4 in a cluster constellation share the same four target opportunity windows. Sat-1, as the leading satellite, has the first access to the ground station and captures Target-1 in advance. The other satellites must capture different targets to ensure unique image captures. This behaviour introduces a non-stationarity issue in the multi-agent system. Each satellite has its own battery (Batt.) and data storage (Mem.) resources, which may be at different levels at the same time step $t$. This scenario highlights the importance of coordination and efficient resource management among satellites in autonomous EO missions.
  • Figure 2: Multi-Satellite Learning Performance Under Cluster and Walker-Delta Orbits: Evaluated with both default and limited resources, including Battery ($B$), Data Storage ($D$), Baud Rate ($Bdr$), captured image sizes ($img$), and the presence of randomness.
  • Figure 3: Target Capturing Action Frequencies Across Different Satellites and Algorithms: Evaluated under varying data storage capacities ($D$), with Sat-1 to Sat-4 having (5, 10, 250, 500) GB.