Table of Contents
Fetching ...

Multi-Satellite Beam Hopping and Power Allocation Using Deep Reinforcement Learning

Xia Xie, Kexin Fan, Wenfeng Deng, Nikolaos Pappas, Qinyu Zhang

TL;DR

The paper addresses traffic-driven beam hopping and power allocation in multi-NGSO satellite systems, formulating the problem as a Markov decision process and solving it with a proximal policy optimization (PPO) framework that uses a hybrid discrete-continuous action space. By deploying two parallel policy networks with a shared base, the method jointly optimizes beam illumination patterns and beam powers, exploiting time, space, and power degrees of freedom to balance throughput and LTCAD. In simulations with five satellites and 161 cells, the proposed DRL approach converges efficiently and achieves up to 8.9% higher throughput and up to 69.2% LTCAD reduction compared with four benchmarks, especially under time-varying, high-load traffic. The results demonstrate the practical potential of DRL-based joint BH scheduling and power allocation for scalable, interference-aware multi-NGSO constellations, with significant implications for next-generation broadband satellites.

Abstract

In non-geostationary orbit (NGSO) satellite communication systems, effectively utilizing beam hopping (BH) technology is crucial for addressing uneven traffic demands. However, optimizing beam scheduling and resource allocation in multi-NGSO BH scenarios remains a significant challenge. This paper proposes a multi-NGSO BH algorithm based on deep reinforcement learning (DRL) to optimize beam illumination patterns and power allocation. By leveraging three degrees of freedom (i.e., time, space, and power), the algorithm aims to optimize the long-term throughput and the long-term cumulative average delay (LTCAD). The solution is based on proximal policy optimization (PPO) with a hybrid action space combining discrete and continuous actions. Using two policy networks with a shared base layer, the proposed algorithm jointly optimizes beam scheduling and power allocation. One network selects beam illumination patterns in the discrete action space, while the other manages power allocation in the continuous space. Simulation results show that the proposed algorithm significantly reduces LTCAD while maintaining high throughput in time-varying traffic scenarios. Compared to the four benchmark methods, it improves network throughput by up to $8.9\%$ and reduces LTCAD by up to $69.2\%$

Multi-Satellite Beam Hopping and Power Allocation Using Deep Reinforcement Learning

TL;DR

The paper addresses traffic-driven beam hopping and power allocation in multi-NGSO satellite systems, formulating the problem as a Markov decision process and solving it with a proximal policy optimization (PPO) framework that uses a hybrid discrete-continuous action space. By deploying two parallel policy networks with a shared base, the method jointly optimizes beam illumination patterns and beam powers, exploiting time, space, and power degrees of freedom to balance throughput and LTCAD. In simulations with five satellites and 161 cells, the proposed DRL approach converges efficiently and achieves up to 8.9% higher throughput and up to 69.2% LTCAD reduction compared with four benchmarks, especially under time-varying, high-load traffic. The results demonstrate the practical potential of DRL-based joint BH scheduling and power allocation for scalable, interference-aware multi-NGSO constellations, with significant implications for next-generation broadband satellites.

Abstract

In non-geostationary orbit (NGSO) satellite communication systems, effectively utilizing beam hopping (BH) technology is crucial for addressing uneven traffic demands. However, optimizing beam scheduling and resource allocation in multi-NGSO BH scenarios remains a significant challenge. This paper proposes a multi-NGSO BH algorithm based on deep reinforcement learning (DRL) to optimize beam illumination patterns and power allocation. By leveraging three degrees of freedom (i.e., time, space, and power), the algorithm aims to optimize the long-term throughput and the long-term cumulative average delay (LTCAD). The solution is based on proximal policy optimization (PPO) with a hybrid action space combining discrete and continuous actions. Using two policy networks with a shared base layer, the proposed algorithm jointly optimizes beam scheduling and power allocation. One network selects beam illumination patterns in the discrete action space, while the other manages power allocation in the continuous space. Simulation results show that the proposed algorithm significantly reduces LTCAD while maintaining high throughput in time-varying traffic scenarios. Compared to the four benchmark methods, it improves network throughput by up to and reduces LTCAD by up to
Paper Structure (16 sections, 30 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 30 equations, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: The forward link of the multi-NGSO beam hopping communication system.
  • Figure 2: Beam hopping illumination patterns and power allocation.
  • Figure 3: PPO framework for the proposed optimization problem
  • Figure 4: The convergence of reward.
  • Figure 5: The convergence of throughput.
  • ...and 8 more figures