Table of Contents
Fetching ...

Learning Generalizable Policy for Obstacle-Aware Autonomous Drone Racing

Yueqian Liu

TL;DR

This work addresses the challenge of developing a generalizable obstacle-aware drone racing policy using deep reinforcement learning by applying domain randomization on racing tracks and obstacle configurations before every rollout, combined with parallel experience collection in randomized environments to achieve the goal.

Abstract

Autonomous drone racing has gained attention for its potential to push the boundaries of drone navigation technologies. While much of the existing research focuses on racing in obstacle-free environments, few studies have addressed the complexities of obstacle-aware racing, and approaches presented in these studies often suffer from overfitting, with learned policies generalizing poorly to new environments. This work addresses the challenge of developing a generalizable obstacle-aware drone racing policy using deep reinforcement learning. We propose applying domain randomization on racing tracks and obstacle configurations before every rollout, combined with parallel experience collection in randomized environments to achieve the goal. The proposed randomization strategy is shown to be effective through simulated experiments where drones reach speeds of up to 70 km/h, racing in unseen cluttered environments. This study serves as a stepping stone toward learning robust policies for obstacle-aware drone racing and general-purpose drone navigation in cluttered environments. Code is available at https://github.com/ErcBunny/IsaacGymEnvs.

Learning Generalizable Policy for Obstacle-Aware Autonomous Drone Racing

TL;DR

This work addresses the challenge of developing a generalizable obstacle-aware drone racing policy using deep reinforcement learning by applying domain randomization on racing tracks and obstacle configurations before every rollout, combined with parallel experience collection in randomized environments to achieve the goal.

Abstract

Autonomous drone racing has gained attention for its potential to push the boundaries of drone navigation technologies. While much of the existing research focuses on racing in obstacle-free environments, few studies have addressed the complexities of obstacle-aware racing, and approaches presented in these studies often suffer from overfitting, with learned policies generalizing poorly to new environments. This work addresses the challenge of developing a generalizable obstacle-aware drone racing policy using deep reinforcement learning. We propose applying domain randomization on racing tracks and obstacle configurations before every rollout, combined with parallel experience collection in randomized environments to achieve the goal. The proposed randomization strategy is shown to be effective through simulated experiments where drones reach speeds of up to 70 km/h, racing in unseen cluttered environments. This study serves as a stepping stone toward learning robust policies for obstacle-aware drone racing and general-purpose drone navigation in cluttered environments. Code is available at https://github.com/ErcBunny/IsaacGymEnvs.

Paper Structure

This paper contains 25 sections, 13 equations, 11 figures.

Figures (11)

  • Figure 1: Trajectories of successful rollouts of a single policy on multiple different racing tracks with obstacles designed to block flight paths.
  • Figure 2: Illustration of part of the target waypoint's guidance reward field. The pass-through region (outlined using black lines) and the waypoint frame (RGB-$xyz$) are displayed at the center. The field spans to the entire $\mathbb{R}^3$.
  • Figure 3: Illustration of parameters describing relative waypoint poses (a) and obstacles managed by the obstacle manager (b)-(e). Sub-figure (b) shows orbital obstacles, (c) and (d) show tree-like obstacles from different views, and (e) shows wall-like obstacles between waypoints.
  • Figure 4: Illustration of parallel environments for training. Environments are tiled up in Isaac Gym but are independent and asynchronous. Debug views of the waypoints are enabled here for visualization purposes, but are disabled during actual training.
  • Figure 5: Logged mean episode length in steps, mean total reward, mean collision reward, and mean waypoint reward throughout training.
  • ...and 6 more figures