UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning
Saichao Liu, Geng Sun, Jiahui Li, Shuang Liang, Qingqing Wu, Pengfei Wang, Dusit Niyato
TL;DR
The paper tackles joint optimization of UAV positions and excitation weights for UAV-enabled collaborative beamforming toward base stations, aiming to maximize the UVAA transmission rate while minimizing motion energy. It models the problem as a multi-agent Markov game and introduces HATRPO-UCB, an improved heterogeneous-agent trust region policy optimization algorithm with three enhancements: observation augmentation, agent-specific global state, and Beta-distributed policy to handle bounded actions. Across simulations, HATRPO-UCB demonstrates faster convergence and better energy-rate tradeoffs than baselines, with ablation confirming the value of each enhancement. The approach offers a scalable, real-time framework for energy-efficient, CB-enabled UAV swarms in dynamic A2G networks.
Abstract
In this paper, we investigate an unmanned aerial vehicle (UAV)-assistant air-to-ground communication system, where multiple UAVs form a UAV-enabled virtual antenna array (UVAA) to communicate with remote base stations by utilizing collaborative beamforming. To improve the work efficiency of the UVAA, we formulate a UAV-enabled collaborative beamforming multi-objective optimization problem (UCBMOP) to simultaneously maximize the transmission rate of the UVAA and minimize the energy consumption of all UAVs by optimizing the positions and excitation current weights of all UAVs. This problem is challenging because these two optimization objectives conflict with each other, and they are non-concave to the optimization variables. Moreover, the system is dynamic, and the cooperation among UAVs is complex, making traditional methods take much time to compute the optimization solution for a single task. In addition, as the task changes, the previously obtained solution will become obsolete and invalid. To handle these issues, we leverage the multi-agent deep reinforcement learning (MADRL) to address the UCBMOP. Specifically, we use the heterogeneous-agent trust region policy optimization (HATRPO) as the basic framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB, where three techniques are introduced to enhance the performance. Simulation results demonstrate that the proposed algorithm can learn a better strategy compared with other methods. Moreover, extensive experiments also demonstrate the effectiveness of the proposed techniques.
