Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach
Yifeng Lyu, Han Hu, Rongfei Fan, Zhi Liu, Jianping An, Shiwen Mao
TL;DR
The paper tackles routing in integrated satellite-terrestrial networks (ISTNs) where ground stations and LEO satellites must collaboratively forward traffic under energy and packet-loss constraints. It formulates the problem as a constrained Dec-POMDP and introduces CMADR, a constrained multi-agent reinforcement learning algorithm that solves a max-min objective using Lagrangian relaxation. CMADR employs a CTDE architecture with per-agent actors, local cost critics for energy constraints, and centralized reward and cost critics to guide policy updates, including adaptive Lagrange multipliers. Experiments on OneWeb and Telesat constellations show substantial delay reductions (at least 21% and 15%) while meeting the energy and packet-loss requirements, demonstrating robust performance under topology changes and network dynamics.
Abstract
The integrated satellite-terrestrial network (ISTN) system has experienced significant growth, offering seamless communication services in remote areas with limited terrestrial infrastructure. However, designing a routing scheme for ISTN is exceedingly difficult, primarily due to the heightened complexity resulting from the inclusion of additional ground stations, along with the requirement to satisfy various constraints related to satellite service quality. To address these challenges, we study packet routing with ground stations and satellites working jointly to transmit packets, while prioritizing fast communication and meeting energy efficiency and packet loss requirements. Specifically, we formulate the problem of packet routing with constraints as a max-min problem using the Lagrange method. Then we propose a novel constrained Multi-Agent reinforcement learning (MARL) dynamic routing algorithm named CMADR, which efficiently balances objective improvement and constraint satisfaction during the updating of policy and Lagrange multipliers. Finally, we conduct extensive experiments and an ablation study using the OneWeb and Telesat mega-constellations. Results demonstrate that CMADR reduces the packet delay by a minimum of 21% and 15%, while meeting stringent energy consumption and packet loss rate constraints, outperforming several baseline algorithms.
