Multi-Agent Reinforcement Learning for Heterogeneous Satellite Cluster Resources Optimization
Mohamad A. Hady, Siyi Hu, Mahardhika Pratama, Zehong Cao, Ryszard Kowalczyk
TL;DR
The paper tackles resource optimization for a heterogeneous satellite cluster performing autonomous Earth Observation by framing the problem as a cooperative multi-agent reinforcement learning task under partial observability and limited onboard resources. It develops a structured modeling framework that captures heterogeneity through agent-specific capabilities and uses a Decentralized POMDP formulation with CTDE to train policies. The study compares MAPPO, HATRPO, and HAPPO in realistic Basilisk/BSK-RL simulations, showing that heterogeneous-agent methods achieve robust coordination and efficient resource use, with HATRPO excelling under tight constraints and HAPPO performing strongly in stable conditions. The results support scalable, autonomous EO mission planning under heterogeneity and dynamic conditions, and point to future work in larger clusters and incorporating domain knowledge to further mitigate non-stationarity.
Abstract
This work investigates resource optimization in heterogeneous satellite clusters performing autonomous Earth Observation (EO) missions using Reinforcement Learning (RL). In the proposed setting, two optical satellites and one Synthetic Aperture Radar (SAR) satellite operate cooperatively in low Earth orbit to capture ground targets and manage their limited onboard resources efficiently. Traditional optimization methods struggle to handle the real-time, uncertain, and decentralized nature of EO operations, motivating the use of RL and Multi-Agent Reinforcement Learning (MARL) for adaptive decision-making. This study systematically formulates the optimization problem from single-satellite to multi-satellite scenarios, addressing key challenges including energy and memory constraints, partial observability, and agent heterogeneity arising from diverse payload capabilities. Using a near-realistic simulation environment built on the Basilisk and BSK-RL frameworks, we evaluate the performance and stability of state-of-the-art MARL algorithms such as MAPPO, HAPPO, and HATRPO. Results show that MARL enables effective coordination across heterogeneous satellites, balancing imaging performance and resource utilization while mitigating non-stationarity and inter-agent reward coupling. The findings provide practical insights into scalable, autonomous satellite operations and contribute a foundation for future research on intelligent EO mission planning under heterogeneous and dynamic conditions.
