Optimizing UAV Aerial Base Station Flights Using DRL-based Proximal Policy Optimization
Mario Rico Ibanez, Azim Akhtarshenas, David Lopez-Perez, Giovanni Geraci
TL;DR
This work tackles real-time UAV trajectory optimization for emergency cellular networks using a DRL-based Proximal Policy Optimization framework. It replaces GPS-based state information with AoA/reference-signal measurements to drive continuous-action UAV movements, optimizing the objective $R_{\text{fair}}(\bm{\rho}^{\mathrm{U}}, \bm{\rho}^{\mathrm{D}})$, the sum of $\log$-transformed UE data rates. The approach leverages PPO’s clipped objective and GAE within a continuous action space $(\alpha, r)$, with a state vector that includes past UAV positions, SINR histories, and AoA statistics over memory length $M$. Experiments across static, linear, circular, and hotspot UE mobility patterns show that the PPO-driven UAVs maintain high throughput and adapt to mobility, often outperforming static central placements, and exhibit robustness to AoA estimation noise. The results suggest practical viability for deploying continuously controlled UAV-based base stations in emergency scenarios and motivate extending to multi-UAV coordination for enhanced coverage and reliability.
Abstract
Unmanned aerial vehicle (UAV)-based base stations offer a promising solution in emergencies where the rapid deployment of cutting-edge networks is crucial for maximizing life-saving potential. Optimizing the strategic positioning of these UAVs is essential for enhancing communication efficiency. This paper introduces an automated reinforcement learning approach that enables UAVs to dynamically interact with their environment and determine optimal configurations. By leveraging the radio signal sensing capabilities of communication networks, our method provides a more realistic perspective, utilizing state-of-the-art algorithm -- proximal policy optimization -- to learn and generalize positioning strategies across diverse user equipment (UE) movement patterns. We evaluate our approach across various UE mobility scenarios, including static, random, linear, circular, and mixed hotspot movements. The numerical results demonstrate the algorithm's adaptability and effectiveness in maintaining comprehensive coverage across all movement patterns.
