Table of Contents
Fetching ...

Quadrotor Navigation using Reinforcement Learning with Privileged Information

Jonathan Lee, Abhishek Rathod, Kshitij Goel, John Stecklein, Wennie Tabib

TL;DR

A reinforcement learning-based quadrotor navigation method that leverages efficient differentiable simulation, novel loss functions, and privileged information to navigate around large obstacles to achieve an 86% success rate and outperforms baseline strategies by 34%.

Abstract

This paper presents a reinforcement learning-based quadrotor navigation method that leverages efficient differentiable simulation, novel loss functions, and privileged information to navigate around large obstacles. Prior learning-based methods perform well in scenes that exhibit narrow obstacles, but struggle when the goal location is blocked by large walls or terrain. In contrast, the proposed method utilizes time-of-arrival (ToA) maps as privileged information and a yaw alignment loss to guide the robot around large obstacles. The policy is evaluated in photo-realistic simulation environments containing large obstacles, sharp corners, and dead-ends. Our approach achieves an 86% success rate and outperforms baseline strategies by 34%. We deploy the policy onboard a custom quadrotor in outdoor cluttered environments both during the day and night. The policy is validated across 20 flights, covering 589 meters without collisions at speeds up to 4 m/s.

Quadrotor Navigation using Reinforcement Learning with Privileged Information

TL;DR

A reinforcement learning-based quadrotor navigation method that leverages efficient differentiable simulation, novel loss functions, and privileged information to navigate around large obstacles to achieve an 86% success rate and outperforms baseline strategies by 34%.

Abstract

This paper presents a reinforcement learning-based quadrotor navigation method that leverages efficient differentiable simulation, novel loss functions, and privileged information to navigate around large obstacles. Prior learning-based methods perform well in scenes that exhibit narrow obstacles, but struggle when the goal location is blocked by large walls or terrain. In contrast, the proposed method utilizes time-of-arrival (ToA) maps as privileged information and a yaw alignment loss to guide the robot around large obstacles. The policy is evaluated in photo-realistic simulation environments containing large obstacles, sharp corners, and dead-ends. Our approach achieves an 86% success rate and outperforms baseline strategies by 34%. We deploy the policy onboard a custom quadrotor in outdoor cluttered environments both during the day and night. The policy is validated across 20 flights, covering 589 meters without collisions at speeds up to 4 m/s.

Paper Structure

This paper contains 22 sections, 8 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: This paper develops and deploys an end-to-end policy to navigate in challenging environments. The approach outperforms the state of the art by 34%. An example trajectory is captured using long-exposure photography.
  • Figure 2: Differentiable dynamics enables direct policy updates by performing gradient descent on the loss function.
  • Figure 3: The end-to-end planning and control architecture is trained as a single neural network. Feature extractors process each input before they are flattened and summed together. A GRUCell helps to maintain consistent action predictions over time.
  • Figure 4: \ref{['sfig:geo_dist_map']} Heatmap of time-of-arrival (ToA) computed using fast marching method (FMM) and overlayed gradient field. \ref{['sfig:shortest_paths']} Shortest paths along ToA gradient from starting points (green dots) to the target (red dot) guides robot around concave obstacle regions.
  • Figure 5: Top down view of two cylinder shaped training environments with random primitive obstacles and starting points at a fixed radius from the goal (blue). Trajectories illustrate paths following the time-of-arrival map (yellow to blue).
  • ...and 5 more figures