Table of Contents
Fetching ...

Dynamic Q-planning for Online UAV Path Planning in Unknown and Complex Environments

Lidia Gianne Souza da Rocha, Kenny Anderson Queiroz Caldas, Marco Henrique Terra, Fabio Ramos, Kelen Cristiane Teixeira Vivaldini

TL;DR

The paper tackles online UAV path planning in unknown, complex environments by introducing a dynamic iteration strategy for Q-Learning, enabling reliable real-time trajectory generation. It integrates environment mapping via LIDAR, a Plannie-based planner with a safety module, and cubic spline smoothing to convert discrete Q-values into smooth, executable paths. Through SITL-Gazebo simulations in indoor and outdoor scenarios, the dynamic-iteration Q-Learning approach achieves 100% trajectory completeness with lower computation than fixed-iteration Q-Learning and competitive performance against A*, RRT, and PSO, while maintaining real-time replanning capability. This work advances practical UAV navigation by balancing exploration, safety, and efficiency, offering a scalable solution for unknown environments and time-constrained missions.

Abstract

Unmanned Aerial Vehicles need an online path planning capability to move in high-risk missions in unknown and complex environments to complete them safely. However, many algorithms reported in the literature may not return reliable trajectories to solve online problems in these scenarios. The Q-Learning algorithm, a Reinforcement Learning Technique, can generate trajectories in real-time and has demonstrated fast and reliable results. This technique, however, has the disadvantage of defining the iteration number. If this value is not well defined, it will take a long time or not return an optimal trajectory. Therefore, we propose a method to dynamically choose the number of iterations to obtain the best performance of Q-Learning. The proposed method is compared to the Q-Learning algorithm with a fixed number of iterations, A*, Rapid-Exploring Random Tree, and Particle Swarm Optimization. As a result, the proposed Q-learning algorithm demonstrates the efficacy and reliability of online path planning with a dynamic number of iterations to carry out online missions in unknown and complex environments.

Dynamic Q-planning for Online UAV Path Planning in Unknown and Complex Environments

TL;DR

The paper tackles online UAV path planning in unknown, complex environments by introducing a dynamic iteration strategy for Q-Learning, enabling reliable real-time trajectory generation. It integrates environment mapping via LIDAR, a Plannie-based planner with a safety module, and cubic spline smoothing to convert discrete Q-values into smooth, executable paths. Through SITL-Gazebo simulations in indoor and outdoor scenarios, the dynamic-iteration Q-Learning approach achieves 100% trajectory completeness with lower computation than fixed-iteration Q-Learning and competitive performance against A*, RRT, and PSO, while maintaining real-time replanning capability. This work advances practical UAV navigation by balancing exploration, safety, and efficiency, offering a scalable solution for unknown environments and time-constrained missions.

Abstract

Unmanned Aerial Vehicles need an online path planning capability to move in high-risk missions in unknown and complex environments to complete them safely. However, many algorithms reported in the literature may not return reliable trajectories to solve online problems in these scenarios. The Q-Learning algorithm, a Reinforcement Learning Technique, can generate trajectories in real-time and has demonstrated fast and reliable results. This technique, however, has the disadvantage of defining the iteration number. If this value is not well defined, it will take a long time or not return an optimal trajectory. Therefore, we propose a method to dynamically choose the number of iterations to obtain the best performance of Q-Learning. The proposed method is compared to the Q-Learning algorithm with a fixed number of iterations, A*, Rapid-Exploring Random Tree, and Particle Swarm Optimization. As a result, the proposed Q-learning algorithm demonstrates the efficacy and reliability of online path planning with a dynamic number of iterations to carry out online missions in unknown and complex environments.
Paper Structure (17 sections, 9 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 17 sections, 9 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: The process for the UAV discovers the unknown environment: (a) Generating a trajectory in an accessible path, (b) Identifying collision along the trajectory, and (c) Replanning the trajectory to avoid the obstacle due to the collision.
  • Figure 2: Python simulations in a small and simple environment. In (a) Trajectory generated just with spline (b) Trajectory generated just with A* algorithm (c) Trajectory generated with A* algorithm with the spline.
  • Figure 3: Python simulations in a large and complex environment. In (a) Trajectory generated just with spline (b) Trajectory generated just with A* algorithm (c) Trajectory generated with A* algorithm with the spline.
  • Figure 4: Top and side view of the environments. The indoor environment is constructed of cardboard boxes, while the outdoor environment consists of trees. In (a) Top view of the indoor environment, (b) Top view of the outdoor environment, (c) Side view of the indoor environment, and (d) Side view of the outdoor environment.
  • Figure 5: Rewards of Q-Learning algorithm for each iteration. In (a) indoor environment and (b) outdoor environment.
  • ...and 1 more figures