Enhancing UAV Path Planning Efficiency Through Accelerated Learning

Joseanne Viana; Boris Galkin; Lester Ho; Holger Claussen

Enhancing UAV Path Planning Efficiency Through Accelerated Learning

Joseanne Viana, Boris Galkin, Lester Ho, Holger Claussen

TL;DR

The paper addresses memory and convergence bottlenecks in DRL-based UAV path planning for wireless relay scenarios by introducing an Enhanced-TD3 framework that integrates PCA-based dimensionality reduction, prior sampling of reduced state spaces, Prioritized Experience Replay, and a hybrid $MSE$/$MAE$ critic loss. The approach reduces the effective state dimension to $22\%$ of the original while preserving coverage fidelity, achieving about a $4\times$ improvement in convergence speed over standard TD3, as evidenced by reported episode counts and MAE-based map comparisons. These contributions enable faster, more scalable UAV relay planning with lower memory demands, supporting near real-time coverage optimization in telecommunication networks. The work demonstrates practical impact by combining principled dimensionality reduction with advanced DRL techniques to improve learning efficiency in complex, geography-aware wireless environments.

Abstract

Unmanned Aerial Vehicles (UAVs) are increasingly essential in various fields such as surveillance, reconnaissance, and telecommunications. This study aims to develop a learning algorithm for the path planning of UAV wireless communication relays, which can reduce storage requirements and accelerate Deep Reinforcement Learning (DRL) convergence. Assuming the system possesses terrain maps of the area and can estimate user locations using localization algorithms or direct GPS reporting, it can input these parameters into the learning algorithms to achieve optimized path planning performance. However, higher resolution terrain maps are necessary to extract topological information such as terrain height, object distances, and signal blockages. This requirement increases memory and storage demands on UAVs while also lengthening convergence times in DRL algorithms. Similarly, defining the telecommunication coverage map in UAV wireless communication relays using these terrain maps and user position estimations demands higher memory and storage utilization for the learning path planning algorithms. Our approach reduces path planning training time by applying a dimensionality reduction technique based on Principal Component Analysis (PCA), sample combination, Prioritized Experience Replay (PER), and the combination of Mean Squared Error (MSE) and Mean Absolute Error (MAE) loss calculations in the coverage map estimates, thereby enhancing a Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. The proposed solution reduces the convergence episodes needed for basic training by approximately four times compared to the traditional TD3.

Enhancing UAV Path Planning Efficiency Through Accelerated Learning

TL;DR

critic loss. The approach reduces the effective state dimension to

of the original while preserving coverage fidelity, achieving about a

improvement in convergence speed over standard TD3, as evidenced by reported episode counts and MAE-based map comparisons. These contributions enable faster, more scalable UAV relay planning with lower memory demands, supporting near real-time coverage optimization in telecommunication networks. The work demonstrates practical impact by combining principled dimensionality reduction with advanced DRL techniques to improve learning efficiency in complex, geography-aware wireless environments.

Abstract

Paper Structure (15 sections, 12 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 12 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
System Model
Channel Model
Coverage Estimation
TD3 algorithm for Path Planning
State Representation
Action Space
Reward Function
The dimensionality reduction algorithm for State space
Results
Convergence Rate
Comparison between coverage maps in the batch size
Conclusions
Future Work
Acknowledgements

Figures (5)

Figure 1: Details of the air-to-ground connections in the simulation environment.
Figure 2: Details on the air-to-ground wireless propagation estimates.
Figure 3: Coverage Map Resolution Comparison
Figure 4: 100 average training comparison after running 500 episodes. (obtaining a reasonable training score using TD3, TD3+PCA, E-TD3)
Figure 5: MAE Between the Original and Dimensionality Reduced Coverage Maps in one Batch Size

Enhancing UAV Path Planning Efficiency Through Accelerated Learning

TL;DR

Abstract

Enhancing UAV Path Planning Efficiency Through Accelerated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)