Table of Contents
Fetching ...

Deep RL-based Autonomous Navigation of Micro Aerial Vehicles (MAVs) in a complex GPS-denied Indoor Environment

Amit Kumar Singh, Prasanth Kumar Duba, P. Rajalakshmi

TL;DR

This work tackles GPS-denied indoor MAV navigation by using a Deep-Proximal Policy Optimization framework that operates on monocular RGB images converted to depth. The method trains a CNN-based policy end-to-end in Unreal Engine-based simulations (AirSim) and validates on real hardware, including a DJI Tello, achieving substantial training-time reductions without sacrificing performance. The main contributions are the DPPO-based learning pipeline, monocular-depth-to-action control on a 7x7 grid, and real-world TiHAN testbed validation with notable gains in mean safe flight and real-time navigation. The results demonstrate practical viability for autonomous MAV operation in cluttered indoor environments and offer a path toward denser scenarios.

Abstract

The Autonomy of Unmanned Aerial Vehicles (UAVs) in indoor environments poses significant challenges due to the lack of reliable GPS signals in enclosed spaces such as warehouses, factories, and indoor facilities. Micro Aerial Vehicles (MAVs) are preferred for navigating in these complex, GPS-denied scenarios because of their agility, low power consumption, and limited computational capabilities. In this paper, we propose a Reinforcement Learning based Deep-Proximal Policy Optimization (D-PPO) algorithm to enhance realtime navigation through improving the computation efficiency. The end-to-end network is trained in 3D realistic meta-environments created using the Unreal Engine. With these trained meta-weights, the MAV system underwent extensive experimental trials in real-world indoor environments. The results indicate that the proposed method reduces computational latency by 91\% during training period without significant degradation in performance. The algorithm was tested on a DJI Tello drone, yielding similar results.

Deep RL-based Autonomous Navigation of Micro Aerial Vehicles (MAVs) in a complex GPS-denied Indoor Environment

TL;DR

This work tackles GPS-denied indoor MAV navigation by using a Deep-Proximal Policy Optimization framework that operates on monocular RGB images converted to depth. The method trains a CNN-based policy end-to-end in Unreal Engine-based simulations (AirSim) and validates on real hardware, including a DJI Tello, achieving substantial training-time reductions without sacrificing performance. The main contributions are the DPPO-based learning pipeline, monocular-depth-to-action control on a 7x7 grid, and real-world TiHAN testbed validation with notable gains in mean safe flight and real-time navigation. The results demonstrate practical viability for autonomous MAV operation in cluttered indoor environments and offer a path toward denser scenarios.

Abstract

The Autonomy of Unmanned Aerial Vehicles (UAVs) in indoor environments poses significant challenges due to the lack of reliable GPS signals in enclosed spaces such as warehouses, factories, and indoor facilities. Micro Aerial Vehicles (MAVs) are preferred for navigating in these complex, GPS-denied scenarios because of their agility, low power consumption, and limited computational capabilities. In this paper, we propose a Reinforcement Learning based Deep-Proximal Policy Optimization (D-PPO) algorithm to enhance realtime navigation through improving the computation efficiency. The end-to-end network is trained in 3D realistic meta-environments created using the Unreal Engine. With these trained meta-weights, the MAV system underwent extensive experimental trials in real-world indoor environments. The results indicate that the proposed method reduces computational latency by 91\% during training period without significant degradation in performance. The algorithm was tested on a DJI Tello drone, yielding similar results.

Paper Structure

This paper contains 11 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Autonomous Navigation of Micro Aerial Vehicle (MAV) inside the TiHAN-Testbed using a Deep-PPO based reinforcement learning algorithm.
  • Figure 2: Proposed system architecture: Conversion of monocular RGB image to depth image followed by CNN training with Deep-PPO algorithm.
  • Figure 3: Plot of moving average value variations across episodes in ten simulations.
  • Figure 4: A 3D realistic meta simulated arena. Left : Navigation in Vanleer environment, Right : Navigation in Cloud environment.