Table of Contents
Fetching ...

Aerial Reliable Collaborative Communications for Terrestrial Mobile Users via Evolutionary Multi-Objective Deep Reinforcement Learning

Geng Sun, Jian Xiao, Jiahui Li, Jiacheng Wang, Jiawen Kang, Dusit Niyato, Shiwen Mao

TL;DR

This work employs collaborative beamforming through a AAV-enabled virtual antenna array to improve transmission performance from the AAV to terrestrial mobile users, under interference from non-associated BSs and dynamic channel conditions.

Abstract

Unmanned aerial vehicles (UAVs) have emerged as the potential aerial base stations (BSs) to improve terrestrial communications. However, the limited onboard energy and antenna power of a UAV restrict its communication range and transmission capability. To address these limitations, this work employs collaborative beamforming through a UAV-enabled virtual antenna array to improve transmission performance from the UAV to terrestrial mobile users, under interference from non-associated BSs and dynamic channel conditions. Specifically, we introduce a memory-based random walk model to more accurately depict the mobility patterns of terrestrial mobile users. Following this, we formulate a multi-objective optimization problem (MOP) focused on maximizing the transmission rate while minimizing the flight energy consumption of the UAV swarm. Given the NP-hard nature of the formulated MOP and the highly dynamic environment, we transform this problem into a multi-objective Markov decision process and propose an improved evolutionary multi-objective reinforcement learning algorithm. Specifically, this algorithm introduces an evolutionary learning approach to obtain the approximate Pareto set for the formulated MOP. Moreover, the algorithm incorporates a long short-term memory network and hyper-sphere-based task selection method to discern the movement patterns of terrestrial mobile users and improve the diversity of the obtained Pareto set. Simulation results demonstrate that the proposed method effectively generates a diverse range of non-dominated policies and outperforms existing methods. Additional simulations demonstrate the scalability and robustness of the proposed CB-based method under different system parameters and various unexpected circumstances.

Aerial Reliable Collaborative Communications for Terrestrial Mobile Users via Evolutionary Multi-Objective Deep Reinforcement Learning

TL;DR

This work employs collaborative beamforming through a AAV-enabled virtual antenna array to improve transmission performance from the AAV to terrestrial mobile users, under interference from non-associated BSs and dynamic channel conditions.

Abstract

Unmanned aerial vehicles (UAVs) have emerged as the potential aerial base stations (BSs) to improve terrestrial communications. However, the limited onboard energy and antenna power of a UAV restrict its communication range and transmission capability. To address these limitations, this work employs collaborative beamforming through a UAV-enabled virtual antenna array to improve transmission performance from the UAV to terrestrial mobile users, under interference from non-associated BSs and dynamic channel conditions. Specifically, we introduce a memory-based random walk model to more accurately depict the mobility patterns of terrestrial mobile users. Following this, we formulate a multi-objective optimization problem (MOP) focused on maximizing the transmission rate while minimizing the flight energy consumption of the UAV swarm. Given the NP-hard nature of the formulated MOP and the highly dynamic environment, we transform this problem into a multi-objective Markov decision process and propose an improved evolutionary multi-objective reinforcement learning algorithm. Specifically, this algorithm introduces an evolutionary learning approach to obtain the approximate Pareto set for the formulated MOP. Moreover, the algorithm incorporates a long short-term memory network and hyper-sphere-based task selection method to discern the movement patterns of terrestrial mobile users and improve the diversity of the obtained Pareto set. Simulation results demonstrate that the proposed method effectively generates a diverse range of non-dominated policies and outperforms existing methods. Additional simulations demonstrate the scalability and robustness of the proposed CB-based method under different system parameters and various unexpected circumstances.

Paper Structure

This paper contains 41 sections, 23 equations, 8 figures, 1 table, 4 algorithms.

Figures (8)

  • Figure 1: A UAV-enabled A2G communication system, where a UAV swarm is deployed to transmit data to a remote terrestrial mobile user via CB. Moreover, this system has a central controller for controlling the UAVs and exists non-associated BS which may interfere with the communications.
  • Figure 2: The algorithmic framework of EMOPPO-VLH is initiated with a warm-up stage, designed to generate a high-quality primary population. Subsequently, EMOPPO-VLH advances to the evolutionary stage, which encompasses task population update, task selection, acquisition of offspring population, and EP archive update. The tasks selected during this stage are optimized by using the LSTM-MOPPO algorithm, resulting in a new generation of offspring. The architecture of LSTM-MOPPO is represented by the part of a black dashed line in the diagram.
  • Figure 3: An illustrative example of performance buffer and hyper-sphere-based task selection strategies. (a) Performance buffer strategies: The lines emanating from the origin represent the buffer. Each circle denotes an objective value calculated by the corresponding policy. Circles outlined in black represent policies that are preserved in the performance buffer. (b) Hyper-sphere-based task selection strategy: The circle around the dot symbolizes the sub-hyper-sphere and the fewer policies it contains, the higher the probability that a corresponding strategy will be selected as the learning task.
  • Figure 4: The schematic map illustrates the simulation setup. UAVs are dispersed across a 100 m $\times$ 100 m region, and a terrestrial mobile user moves randomly within another 100 m $\times$ 100 m rectangular. The BS is positioned at coordinates (100, 100) in meters.
  • Figure 5: Optimization results obtained by various algorithms in the small-scale scenario. (a) $f_1$ obtained by different algorithms. (b) $f_2$ obtained by different algorithms.
  • ...and 3 more figures