Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach

Yifeng Lyu; Han Hu; Rongfei Fan; Zhi Liu; Jianping An; Shiwen Mao

Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach

Yifeng Lyu, Han Hu, Rongfei Fan, Zhi Liu, Jianping An, Shiwen Mao

TL;DR

The paper tackles routing in integrated satellite-terrestrial networks (ISTNs) where ground stations and LEO satellites must collaboratively forward traffic under energy and packet-loss constraints. It formulates the problem as a constrained Dec-POMDP and introduces CMADR, a constrained multi-agent reinforcement learning algorithm that solves a max-min objective using Lagrangian relaxation. CMADR employs a CTDE architecture with per-agent actors, local cost critics for energy constraints, and centralized reward and cost critics to guide policy updates, including adaptive Lagrange multipliers. Experiments on OneWeb and Telesat constellations show substantial delay reductions (at least 21% and 15%) while meeting the energy and packet-loss requirements, demonstrating robust performance under topology changes and network dynamics.

Abstract

The integrated satellite-terrestrial network (ISTN) system has experienced significant growth, offering seamless communication services in remote areas with limited terrestrial infrastructure. However, designing a routing scheme for ISTN is exceedingly difficult, primarily due to the heightened complexity resulting from the inclusion of additional ground stations, along with the requirement to satisfy various constraints related to satellite service quality. To address these challenges, we study packet routing with ground stations and satellites working jointly to transmit packets, while prioritizing fast communication and meeting energy efficiency and packet loss requirements. Specifically, we formulate the problem of packet routing with constraints as a max-min problem using the Lagrange method. Then we propose a novel constrained Multi-Agent reinforcement learning (MARL) dynamic routing algorithm named CMADR, which efficiently balances objective improvement and constraint satisfaction during the updating of policy and Lagrange multipliers. Finally, we conduct extensive experiments and an ablation study using the OneWeb and Telesat mega-constellations. Results demonstrate that CMADR reduces the packet delay by a minimum of 21% and 15%, while meeting stringent energy consumption and packet loss rate constraints, outperforming several baseline algorithms.

Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach

TL;DR

Abstract

Paper Structure (21 sections, 22 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 22 equations, 11 figures, 3 tables, 1 algorithm.

Introduction
RELATED WORK
Static Routing Algorithms
Dynamic Routing Algorithms
SYSTEM MODEL AND PROBLEM FORMULATION
Overall Architecture
Networking Model
Communication Delay
Energy Consumption
Packet Loss Rate
Problem Formulation
Constrained Multi-agent Dynamic Routing Algorithm
Overview of Dec-POMDP
Algorithm Architecture
Training and Executing
...and 6 more sections

Figures (11)

Figure 1: (a) The architecture of ISTN; (b) Workflow of the architecture: ① During each time slot, every satellite distributes its own state information to four neighboring satellites and reachable ground stations; ② Every satellite or ground station generates a routing scheme based on its respective states information, as well as information received from external sources; ③ Within each region serviced by a ground station, user messages are packetized and stored in the station's buffer; ④ Based on the routing scheme, the ground station with data packets in the buffer uploads them to the satellite; ⑤ In accordance with their respective routing schemes, every satellite transmits data to the next satellite in line; ⑥ Data is transmitted to the ground station; ⑦ The delivery of data is directed towards the designated user.
Figure 2: (a) Each ground station's observation includes energy consumption and buffer usage of all connectable satellites, as well as the station's own energy consumption; (b)Each LEO satellite's observation is based on its own energy consumption and buffer usage, as well as those of four neighboring satellites.
Figure 3: Architecture of CMADR. At time slot $t$, each satellite and ground station form their own observations ${o}_{{i}}^{t}$ and ${o}_{{j}}^{t}$ that are transmitted to the local actor and local cost critic. The global state $s$ is sent to the central reward critic and central cost critic to obtain the joint state reward value and joint state cost value for actor network training. Following the interaction, environment provides the global reward ${r}^{t}$ and global cost ${c}^{t}$, as well as the next local observations ${o}_{{i}}^{t+1}$ and ${o}_{{j}}^{t+1}$ at time slot $t+1$. By utilizing the stored transitions, the critic networks will learn more accurate value evaluations and collaborate to train the actor network with the aim of reducing average delay while simultaneously constraining energy consumption and packet loss rate.
Figure 4: (a) The accumulated reward, (b) the average delay, (c) the energy consumption and (d) the packet loss rate within each episode for 300 episodes based on Telesat. The reward curve of CMADR is growing while the average delay curve is declining. Meanwhile, the energy consumption and the packet loss rate curves are declining, ultimately satisfying their respective constraints. CMADR performs the best among all algorithms.
Figure 5: (a) The accumulated reward, (b) the average delay, (c) the energy consumption and (d) the packet loss rate within each episode for 300 episodes based on OneWeb. Similarly, the reward curve of CMADR shows steady growth, while the average delay curve exhibits a decline. Additionally, both the energy consumption and the packet loss rate curves are on a downward trend. Overall, CMADR outperforms all other algorithms.
...and 6 more figures

Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach

TL;DR

Abstract

Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (11)