Asynchronous Risk-Aware Multi-Agent Packet Routing for Ultra-Dense LEO Satellite Networks
Ke He, Thang X. Vu, Le He, Lisheng Fan, Symeon Chatzinotas, Bjorn Ottersten
TL;DR
This work tackles routing in ultra-dense LEO satellite networks by formulating an asynchronous, decentralized packet-routing problem and solving it with PRIMAL, a principled risk-aware MARL framework. PRIMAL leverages a distributional, event-driven semi-Markov formulation and primal-dual learning to constrain tail-end performance via CVaR, yielding two variants: PRIMAL-Avg (expected-cost constraints) and PRIMAL-CVaR (worst-case cost constraints). The approach uses IQN for distributional cost modeling and a soft actor-critic backbone with entropy regularization, enabling scalable, synchronized-free learning across satellites. Empirical results in a 1584-satellite constellation show substantial improvements in average end-to-end latency and dramatic reductions in tail-risk-induced congestion compared to risk-oblivious baselines. The findings demonstrate that accepting small detours to avoid hotspots can significantly reduce delays and improve robustness in dynamic mega-constellations.
Abstract
The rise of ultra-dense LEO constellations creates a complex and asynchronous network environment, driven by their massive scale, dynamic topologies, and significant delays. This unique complexity demands an adaptive packet routing algorithm that is asynchronous, risk-aware, and capable of balancing diverse and often conflicting QoS objectives in a decentralized manner. However, existing methods fail to address this need, as they typically rely on impractical synchronous decision-making and/or risk-oblivious approaches. To tackle this gap, we introduce PRIMAL, an event-driven multi-agent routing framework designed specifically to allow each satellite to act independently on its own event-driven timeline, while managing the risk of worst-case performance degradation via a principled primal-dual approach. This is achieved by enabling agents to learn the full cost distribution of the targeted QoS objectives and constrain tail-end risks. Extensive simulations on a LEO constellation with 1584 satellites validate its superiority in effectively optimizing latency and balancing load. Compared to a recent risk-oblivious baseline, it reduces queuing delay by over 70%, and achieves a nearly 12 ms end-to-end delay reduction in loaded scenarios. This is accomplished by resolving the core conflict between naive shortest-path finding and congestion avoidance, highlighting such autonomous risk-awareness as a key to robust routing.
