Shaping Rewards, Shaping Routes: On Multi-Agent Deep Q-Networks for Routing in Satellite Constellation Networks

Manuel M. H. Roth; Anupama Hegde; Thomas Delamotte; Andreas Knopp

Shaping Rewards, Shaping Routes: On Multi-Agent Deep Q-Networks for Routing in Satellite Constellation Networks

Manuel M. H. Roth, Anupama Hegde, Thomas Delamotte, Andreas Knopp

TL;DR

The paper investigates multi-agent deep Q-learning for routing in satellite constellations, focusing on reward shaping and learning convergence for joint latency and load balancing. It contrasts fully decentralized (FD-MADRL) and centralized (CDRL) learning approaches, analyzes their performance in static and dynamic SCN scenarios, and demonstrates scalability and stability challenges as network size grows. A hybrid Centralized Learning–Decentralized Control (CL-DC) architecture is proposed to blend coordination benefits with the scalability of local decision-making, aiming to improve end-to-end routing coherence. The findings highlight the trade-offs between decentralization and coordination in dynamic, non-uniform traffic environments and suggest directions for more robust, scalable DRL-based routing in NTN and 6G-integrated networks.

Abstract

Effective routing in satellite mega-constellations has become crucial to facilitate the handling of increasing traffic loads, more complex network architectures, as well as the integration into 6G networks. To enhance adaptability as well as robustness to unpredictable traffic demands, and to solve dynamic routing environments efficiently, machine learning-based solutions are being considered. For network control problems, such as optimizing packet forwarding decisions according to Quality of Service requirements and maintaining network stability, deep reinforcement learning techniques have demonstrated promising results. For this reason, we investigate the viability of multi-agent deep Q-networks for routing in satellite constellation networks. We focus specifically on reward shaping and quantifying training convergence for joint optimization of latency and load balancing in static and dynamic scenarios. To address identified drawbacks, we propose a novel hybrid solution based on centralized learning and decentralized control.

Shaping Rewards, Shaping Routes: On Multi-Agent Deep Q-Networks for Routing in Satellite Constellation Networks

TL;DR

Abstract

Paper Structure (10 sections, 4 equations, 5 figures)

This paper contains 10 sections, 4 equations, 5 figures.

Introduction
Routing in Satellite Constellation Networks
Reinforcement Learning Architectures
Reward shaping
Results
Simulation Environment
Performance Evaluation
Static Path Finding
State Evolution: Dynamic Link Loads
Discussion

Figures (5)

Figure 1: Abstracted sub-network representation: satellites (green) with ISLs (green-dotted), and link load (color according to load level).
Figure 2: Smoothed rewards over training for different DQN-based approaches in static scenarios, highlighting differences in convergence.
Figure 3: Comparison of FD-MADRL (in green) and a rule-based multi-cost approach (in red) in terms of latency (number of hops, on the x-axis) and the resulting maximum link load on the chosen path (y-axis).
Figure 4: Smoothed rewards over training for FD-MADRL with dynamic, mutable link loads.
Figure 5: Proposed extended architecture: actor-critic approach using Centralized Learning and Decentralized Control (CL-DC). As in FD-MADRL actors make decisions locally, but receive central guidance from critic network.

Shaping Rewards, Shaping Routes: On Multi-Agent Deep Q-Networks for Routing in Satellite Constellation Networks

TL;DR

Abstract

Shaping Rewards, Shaping Routes: On Multi-Agent Deep Q-Networks for Routing in Satellite Constellation Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)