Table of Contents
Fetching ...

NavRL: Learning Safe Flight in Dynamic Environments

Zhefan Xu, Xinming Han, Haoyu Shen, Hanyu Jin, Kenji Shimada

TL;DR

NavRL tackles the problem of safe autonomous flight in dynamic environments by learning a navigation policy via Proximal Policy Optimization. It integrates a purpose-built perception system, a tailored state-action representation, and a velocity-obstacle-based safety shield to mitigate safety risks from neural networks, achieving zero-shot sim-to-real transfer. The key contributions include a dual-representation obstacle perception, a CNN-augmented PPO policy with curriculum learning, and a VO-based safety mechanism validated through extensive simulations and real-world flights. This framework enables scalable, real-time safe navigation with the potential to reduce collisions in cluttered, dynamic airspaces, enhancing practical deployment of UAVs in complex environments.

Abstract

Safe flight in dynamic environments requires unmanned aerial vehicles (UAVs) to make effective decisions when navigating cluttered spaces with moving obstacles. Traditional approaches often decompose decision-making into hierarchical modules for prediction and planning. Although these handcrafted systems can perform well in specific settings, they might fail if environmental conditions change and often require careful parameter tuning. Additionally, their solutions could be suboptimal due to the use of inaccurate mathematical model assumptions and simplifications aimed at achieving computational efficiency. To overcome these limitations, this paper introduces the NavRL framework, a deep reinforcement learning-based navigation method built on the Proximal Policy Optimization (PPO) algorithm. NavRL utilizes our carefully designed state and action representations, allowing the learned policy to make safe decisions in the presence of both static and dynamic obstacles, with zero-shot transfer from simulation to real-world flight. Furthermore, the proposed method adopts a simple but effective safety shield for the trained policy, inspired by the concept of velocity obstacles, to mitigate potential failures associated with the black-box nature of neural networks. To accelerate the convergence, we implement the training pipeline using NVIDIA Isaac Sim, enabling parallel training with thousands of quadcopters. Simulation and physical experiments show that our method ensures safe navigation in dynamic environments and results in the fewest collisions compared to benchmarks.

NavRL: Learning Safe Flight in Dynamic Environments

TL;DR

NavRL tackles the problem of safe autonomous flight in dynamic environments by learning a navigation policy via Proximal Policy Optimization. It integrates a purpose-built perception system, a tailored state-action representation, and a velocity-obstacle-based safety shield to mitigate safety risks from neural networks, achieving zero-shot sim-to-real transfer. The key contributions include a dual-representation obstacle perception, a CNN-augmented PPO policy with curriculum learning, and a VO-based safety mechanism validated through extensive simulations and real-world flights. This framework enables scalable, real-time safe navigation with the potential to reduce collisions in cluttered, dynamic airspaces, enhancing practical deployment of UAVs in complex environments.

Abstract

Safe flight in dynamic environments requires unmanned aerial vehicles (UAVs) to make effective decisions when navigating cluttered spaces with moving obstacles. Traditional approaches often decompose decision-making into hierarchical modules for prediction and planning. Although these handcrafted systems can perform well in specific settings, they might fail if environmental conditions change and often require careful parameter tuning. Additionally, their solutions could be suboptimal due to the use of inaccurate mathematical model assumptions and simplifications aimed at achieving computational efficiency. To overcome these limitations, this paper introduces the NavRL framework, a deep reinforcement learning-based navigation method built on the Proximal Policy Optimization (PPO) algorithm. NavRL utilizes our carefully designed state and action representations, allowing the learned policy to make safe decisions in the presence of both static and dynamic obstacles, with zero-shot transfer from simulation to real-world flight. Furthermore, the proposed method adopts a simple but effective safety shield for the trained policy, inspired by the concept of velocity obstacles, to mitigate potential failures associated with the black-box nature of neural networks. To accelerate the convergence, we implement the training pipeline using NVIDIA Isaac Sim, enabling parallel training with thousands of quadcopters. Simulation and physical experiments show that our method ensures safe navigation in dynamic environments and results in the fewest collisions compared to benchmarks.
Paper Structure (12 sections, 13 equations, 9 figures, 3 tables)

This paper contains 12 sections, 13 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: A customized quadcopter UAV navigating a dynamic environment using the proposed NavRL framework. The robot achieves safe navigation and effective collision avoidance with both static and dynamic obstacles.
  • Figure 2: The proposed NavRL framework. The perception system processes RGB-D images along with the robot's internal states to generate representations for both static and dynamic obstacles. These representations are then fed into two feature extractors, which produce state embeddings concatenated with the robot's internal states. In the training phase, an actor-critic network structure is utilized to train robots in parallel within the NVIDIA Isaac Sim environment. During the deployment stage, the policy network generates actions that are further refined by a safety shield mechanism to ensure safe robot control.
  • Figure 3: Illustration of map ray casting. Only rays within the maximum range are shown. (a) A top-down view of horizontally cast rays with a 360-degree casting angle. (b) A side view displaying rays in the vertical planes.
  • Figure 4: Visualization of example RL policy actions from Beta distributions.
  • Figure 5: Illustration of determining the safe velocity region using the velocity obstacle-based method. (a) An example scenario where the robot encounters two obstacles. (b) The corresponding velocity obstacle plot with blue arrows showing the minimum velocity change required to exit the VO regions.
  • ...and 4 more figures