Reinforcement Learning Based Dynamic Power Control for UAV Mobility Management
Irshad A. Meer, Karl-Ludwig Besser, Mustafa Ozger, H. Vincent Poor, Cicek Cavdar
TL;DR
The paper tackles energy-efficient downlink power control for UAV mobility management under time-varying reliability, leveraging multi-base-station cooperation. It formulates a multiobjective optimization and solves it with a model-free reinforcement learning method, specifically Soft Actor-Critic (SAC), where actions are the power allocations $P_{T,ik}$ and rewards balance total power usage against outage constraints $\varepsilon_i(t)$. A key contribution is deriving the outage probability for the sum of exponentials under fading, enabling appropriate reliability targets $\varepsilon_{\max}$ in normal and critical zones, and demonstrating a learning-based policy that adapts to UAV movement and changing zones. Numerical results in single and multi-user scenarios show the SAC-based approach reduces energy consumption while meeting stringent reliability targets, highlighting its practical potential for dynamic UAV-enabled networks in future 6G sky infrastructure.
Abstract
Modern communication systems need to fulfill multiple and often conflicting objectives at the same time. In particular, new applications require high reliability while operating at low transmit powers. Moreover, reliability constraints may vary over time depending on the current state of the system. One solution to address this problem is to use joint transmissions from a number of base stations (BSs) to meet the reliability requirements. However, this approach is inefficient when considering the overall total transmit power. In this work, we propose a reinforcement learning-based power allocation scheme for an unmanned aerial vehicle (UAV) communication system with varying communication reliability requirements. In particular, the proposed scheme aims to minimize the total transmit power of all BSs while achieving an outage probability that is less than a tolerated threshold. This threshold varies over time, e.g., when the UAV enters a critical zone with high-reliability requirements. Our results show that the proposed learning scheme uses dynamic power allocation to meet varying reliability requirements, thus effectively conserving energy.
