Dynamic Q-planning for Online UAV Path Planning in Unknown and Complex Environments
Lidia Gianne Souza da Rocha, Kenny Anderson Queiroz Caldas, Marco Henrique Terra, Fabio Ramos, Kelen Cristiane Teixeira Vivaldini
TL;DR
The paper tackles online UAV path planning in unknown, complex environments by introducing a dynamic iteration strategy for Q-Learning, enabling reliable real-time trajectory generation. It integrates environment mapping via LIDAR, a Plannie-based planner with a safety module, and cubic spline smoothing to convert discrete Q-values into smooth, executable paths. Through SITL-Gazebo simulations in indoor and outdoor scenarios, the dynamic-iteration Q-Learning approach achieves 100% trajectory completeness with lower computation than fixed-iteration Q-Learning and competitive performance against A*, RRT, and PSO, while maintaining real-time replanning capability. This work advances practical UAV navigation by balancing exploration, safety, and efficiency, offering a scalable solution for unknown environments and time-constrained missions.
Abstract
Unmanned Aerial Vehicles need an online path planning capability to move in high-risk missions in unknown and complex environments to complete them safely. However, many algorithms reported in the literature may not return reliable trajectories to solve online problems in these scenarios. The Q-Learning algorithm, a Reinforcement Learning Technique, can generate trajectories in real-time and has demonstrated fast and reliable results. This technique, however, has the disadvantage of defining the iteration number. If this value is not well defined, it will take a long time or not return an optimal trajectory. Therefore, we propose a method to dynamically choose the number of iterations to obtain the best performance of Q-Learning. The proposed method is compared to the Q-Learning algorithm with a fixed number of iterations, A*, Rapid-Exploring Random Tree, and Particle Swarm Optimization. As a result, the proposed Q-learning algorithm demonstrates the efficacy and reliability of online path planning with a dynamic number of iterations to carry out online missions in unknown and complex environments.
