Reinforcement Learning Based Dynamic Power Control for UAV Mobility Management

Irshad A. Meer; Karl-Ludwig Besser; Mustafa Ozger; H. Vincent Poor; Cicek Cavdar

Reinforcement Learning Based Dynamic Power Control for UAV Mobility Management

Irshad A. Meer, Karl-Ludwig Besser, Mustafa Ozger, H. Vincent Poor, Cicek Cavdar

TL;DR

The paper tackles energy-efficient downlink power control for UAV mobility management under time-varying reliability, leveraging multi-base-station cooperation. It formulates a multiobjective optimization and solves it with a model-free reinforcement learning method, specifically Soft Actor-Critic (SAC), where actions are the power allocations $P_{T,ik}$ and rewards balance total power usage against outage constraints $\varepsilon_i(t)$. A key contribution is deriving the outage probability for the sum of exponentials under fading, enabling appropriate reliability targets $\varepsilon_{\max}$ in normal and critical zones, and demonstrating a learning-based policy that adapts to UAV movement and changing zones. Numerical results in single and multi-user scenarios show the SAC-based approach reduces energy consumption while meeting stringent reliability targets, highlighting its practical potential for dynamic UAV-enabled networks in future 6G sky infrastructure.

Abstract

Modern communication systems need to fulfill multiple and often conflicting objectives at the same time. In particular, new applications require high reliability while operating at low transmit powers. Moreover, reliability constraints may vary over time depending on the current state of the system. One solution to address this problem is to use joint transmissions from a number of base stations (BSs) to meet the reliability requirements. However, this approach is inefficient when considering the overall total transmit power. In this work, we propose a reinforcement learning-based power allocation scheme for an unmanned aerial vehicle (UAV) communication system with varying communication reliability requirements. In particular, the proposed scheme aims to minimize the total transmit power of all BSs while achieving an outage probability that is less than a tolerated threshold. This threshold varies over time, e.g., when the UAV enters a critical zone with high-reliability requirements. Our results show that the proposed learning scheme uses dynamic power allocation to meet varying reliability requirements, thus effectively conserving energy.

Reinforcement Learning Based Dynamic Power Control for UAV Mobility Management

TL;DR

and rewards balance total power usage against outage constraints

. A key contribution is deriving the outage probability for the sum of exponentials under fading, enabling appropriate reliability targets

in normal and critical zones, and demonstrating a learning-based policy that adapts to UAV movement and changing zones. Numerical results in single and multi-user scenarios show the SAC-based approach reduces energy consumption while meeting stringent reliability targets, highlighting its practical potential for dynamic UAV-enabled networks in future 6G sky infrastructure.

Abstract

Paper Structure (11 sections, 6 equations, 3 figures)

This paper contains 11 sections, 6 equations, 3 figures.

Introduction
System Model and Problem Formulation
Problem Formulation
Reinforcement Learning Approach
Numerical Results
Comparison Schemes
Full Power
Closest Base Station
Single User -- Deterministic Path
Multiple Users -- Random Movement
Conclusion

Figures (3)

Figure 1: The considered communication scenario with fixed base stations and moving . Within the highlighted zone in the center, the reliability requirement is $\varepsilon_{\text{max},2}$, otherwise it is $\varepsilon_{\text{max},1}>\varepsilon_{\text{max},2}$.
Figure 2: Numerical results of the outage probability $\varepsilon$ and the fraction of the total available power used to transmit over time. The single aerial user moves in a straight path diagonally across the $1.5\km\times1.5\km$ area, in which $K=6$ are placed. During the highlighted interval $t\in[750, 1000]$, the is within the critical zone with a stricter reliability target. (\ref{['sub:example-single-uav']})
Figure 3: Numerical results of the distributions of outage probability $\varepsilon$ and the fraction of the total available power. There are $N=3$ aerial users that move in an area of size $3\km\times3\km$ according to the stochastic movement model from Smith2022. A total of $K=19$ is placed in the area to serve them. At $[0.75, 2]\,\km$ in both $x$- and $y$-coordinates, there is the critical zone with a higher reliability target. (\ref{['sub:example-multiple-users']})

Reinforcement Learning Based Dynamic Power Control for UAV Mobility Management

TL;DR

Abstract

Reinforcement Learning Based Dynamic Power Control for UAV Mobility Management

Authors

TL;DR

Abstract

Table of Contents

Figures (3)