Flying on Point Clouds with Reinforcement Learning
Guangtong Xu, Tianyue Wu, Zihan Wang, Qianhao Wang, Fei Gao
TL;DR
The paper tackles autonomous flight of microdrones through clutter using onboard 3D lidar and sim-to-real reinforcement learning to achieve low-latency control. It introduces a task-specific lidar exteroception representation that preserves fine obstacles while enabling lightweight, end-to-end RL trained at 50 Hz, aided by dynamics-domain randomization and a lidar sensing simulator. Key contributions include the exteroception input design, the integrated onboard sensing and control pipeline, and demonstrations of safe navigation through thin obstacles and clutter in both simulation and real-world experiments. The work demonstrates that high-resolution lidar perception paired with sim-to-real learning can deliver deployable quadrotor policies with improved safety and efficiency in cluttered environments, reducing reliance on offboard computation or handcrafted planning.
Abstract
A long-cherished vision of drones is to autonomously traverse through clutter to reach every corner of the world using onboard sensing and computation. In this paper, we combine onboard 3D lidar sensing and sim-to-real reinforcement learning (RL) to enable autonomous flight in cluttered environments. Compared to vision sensors, lidars appear to be more straightforward and accurate for geometric modeling of surroundings, which is one of the most important cues for successful obstacle avoidance. On the other hand, sim-to-real RL approach facilitates the realization of low-latency control, without the hierarchy of trajectory generation and tracking. We demonstrate that, with design choices of practical significance, we can effectively combine the advantages of 3D lidar sensing and RL to control a quadrotor through a low-level control interface at 50Hz. The key to successfully learn the policy in a lightweight way lies in a specialized surrogate of the lidar's raw point clouds, which simplifies learning while retaining a fine-grained perception to detect narrow free space and thin obstacles. Simulation statistics demonstrate the advantages of the proposed system over alternatives, such as performing easier maneuvers and higher success rates at different speed constraints. With lightweight simulation techniques, the policy trained in the simulator can control a physical quadrotor, where the system can dodge thin obstacles and safely traverse randomly distributed obstacles.
