Table of Contents
Fetching ...

Flying on Point Clouds with Reinforcement Learning

Guangtong Xu, Tianyue Wu, Zihan Wang, Qianhao Wang, Fei Gao

TL;DR

The paper tackles autonomous flight of microdrones through clutter using onboard 3D lidar and sim-to-real reinforcement learning to achieve low-latency control. It introduces a task-specific lidar exteroception representation that preserves fine obstacles while enabling lightweight, end-to-end RL trained at 50 Hz, aided by dynamics-domain randomization and a lidar sensing simulator. Key contributions include the exteroception input design, the integrated onboard sensing and control pipeline, and demonstrations of safe navigation through thin obstacles and clutter in both simulation and real-world experiments. The work demonstrates that high-resolution lidar perception paired with sim-to-real learning can deliver deployable quadrotor policies with improved safety and efficiency in cluttered environments, reducing reliance on offboard computation or handcrafted planning.

Abstract

A long-cherished vision of drones is to autonomously traverse through clutter to reach every corner of the world using onboard sensing and computation. In this paper, we combine onboard 3D lidar sensing and sim-to-real reinforcement learning (RL) to enable autonomous flight in cluttered environments. Compared to vision sensors, lidars appear to be more straightforward and accurate for geometric modeling of surroundings, which is one of the most important cues for successful obstacle avoidance. On the other hand, sim-to-real RL approach facilitates the realization of low-latency control, without the hierarchy of trajectory generation and tracking. We demonstrate that, with design choices of practical significance, we can effectively combine the advantages of 3D lidar sensing and RL to control a quadrotor through a low-level control interface at 50Hz. The key to successfully learn the policy in a lightweight way lies in a specialized surrogate of the lidar's raw point clouds, which simplifies learning while retaining a fine-grained perception to detect narrow free space and thin obstacles. Simulation statistics demonstrate the advantages of the proposed system over alternatives, such as performing easier maneuvers and higher success rates at different speed constraints. With lightweight simulation techniques, the policy trained in the simulator can control a physical quadrotor, where the system can dodge thin obstacles and safely traverse randomly distributed obstacles.

Flying on Point Clouds with Reinforcement Learning

TL;DR

The paper tackles autonomous flight of microdrones through clutter using onboard 3D lidar and sim-to-real reinforcement learning to achieve low-latency control. It introduces a task-specific lidar exteroception representation that preserves fine obstacles while enabling lightweight, end-to-end RL trained at 50 Hz, aided by dynamics-domain randomization and a lidar sensing simulator. Key contributions include the exteroception input design, the integrated onboard sensing and control pipeline, and demonstrations of safe navigation through thin obstacles and clutter in both simulation and real-world experiments. The work demonstrates that high-resolution lidar perception paired with sim-to-real learning can deliver deployable quadrotor policies with improved safety and efficiency in cluttered environments, reducing reliance on offboard computation or handcrafted planning.

Abstract

A long-cherished vision of drones is to autonomously traverse through clutter to reach every corner of the world using onboard sensing and computation. In this paper, we combine onboard 3D lidar sensing and sim-to-real reinforcement learning (RL) to enable autonomous flight in cluttered environments. Compared to vision sensors, lidars appear to be more straightforward and accurate for geometric modeling of surroundings, which is one of the most important cues for successful obstacle avoidance. On the other hand, sim-to-real RL approach facilitates the realization of low-latency control, without the hierarchy of trajectory generation and tracking. We demonstrate that, with design choices of practical significance, we can effectively combine the advantages of 3D lidar sensing and RL to control a quadrotor through a low-level control interface at 50Hz. The key to successfully learn the policy in a lightweight way lies in a specialized surrogate of the lidar's raw point clouds, which simplifies learning while retaining a fine-grained perception to detect narrow free space and thin obstacles. Simulation statistics demonstrate the advantages of the proposed system over alternatives, such as performing easier maneuvers and higher success rates at different speed constraints. With lightweight simulation techniques, the policy trained in the simulator can control a physical quadrotor, where the system can dodge thin obstacles and safely traverse randomly distributed obstacles.

Paper Structure

This paper contains 28 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: The illustration of the task-relevant lidar sensing representation. (a) The red square cones are examples of partitions for constructing the lidar sensing input, and the red dots represent the raw point cloud. The grid lines on the sphere visualize the partitions. (b) On the top is a schematic of the lidar's FoV, where the cartoon image is from the Livox Mid-360 website livoxmid360. On the bottom is the unknown region calculated from the historical frames of FoV.
  • Figure 2: The system and policy architectures. The system includes algorithmic components such as a MLP neural controller and estimation algorithms. The lidar sensing representation from MLP encoder, along with velocity $\boldsymbol{v}$, attitude $\boldsymbol{\mathrm q}$, goal direction $\boldsymbol{\mathrm g}$, last desired thrust $T_{\mathrm{last}}$, and last desired bodyrate $\boldsymbol \omega_{\mathrm{last}}$ are fed into the MLP fusion module. The output command is desired thrust $T$ and bodyrate $\boldsymbol \omega$.
  • Figure 3: The evolution of undiscounted return of the proposed exteroception representation vs occupancy map during training.$N_e$ and $N_{bs}$ denote the number of training environments and mini-batch size, respectively.
  • Figure 4: Benchmark results with previous systems under different maximum speeds. Figures (a)-(c) illustrate the comparison results of success rate in scenarios I, II, and III, respectively. Scenario I: A 40m$\times$10m map with around 100 obstacles, and the obstacle radius is randomly sampled within 0.5 $\sim$ 0.7m. Scenario II: The size of the obstacles is set within 0.5 $\sim$ 0.7m, but the number of obstacles is increased to around 130. Scenario III: The obstacle radius is randomized between 0.5m $\sim$ 1.2m, with around 90 obstacles. (d) The example (successful) trajectory results of the compared approaches in scenario III with the speed constraint of 3.0m/s, and the corresponding flight distances are provided.
  • Figure 5: Results of indoor flight of the proposed system. (a) Indoor scenario I. (b) Indoor scenario II. The quadrotor can fly through the cluttered environments safely. The arrows represent the flight direction. (c) & (d) Two highlighted trials of thin-obstacle avoidance and corresponding snapshots.