Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

Jiawei Fu; Yunlong Song; Yan Wu; Fisher Yu; Davide Scaramuzza

Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

Jiawei Fu, Yunlong Song, Yan Wu, Fisher Yu, Davide Scaramuzza

TL;DR

The paper addresses vision-based autonomous drone racing by eliminating the need for global state estimation and trajectory planning through a deep sensorimotor policy learned from raw images. It introduces a two-stage learning-by-cheating framework: a privileged-state teacher trained with full state information via PPO, and a vision-only student that learns to map image embeddings to control commands through imitation, aided by BYOL-style contrastive learning and YOLO-based feature extraction. In Flightmare, the vision-based policy achieves racing performance near the state-based policy and near the time-optimal bound, with strong robustness to visual disturbances and distractors. This work demonstrates the feasibility of image-only control for high-speed drones and points toward real-world transfer and history-based (memory) extensions to remove reliance on partial state inputs.

Abstract

Autonomous drones can operate in remote and unstructured environments, enabling various real-world applications. However, the lack of effective vision-based algorithms has been a stumbling block to achieving this goal. Existing systems often require hand-engineered components for state estimation, planning, and control. Such a sequential design involves laborious tuning, human heuristics, and compounding delays and errors. This paper tackles the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies. We use contrastive learning to extract robust feature representations from the input images and leverage a two-stage learning-by-cheating framework for training a neural network policy. The resulting policy directly infers control commands with feature representations learned from raw images, forgoing the need for globally-consistent state estimation, trajectory planning, and handcrafted control design. Our experimental results indicate that our vision-based policy can achieve the same level of racing performance as the state-based policy while being robust against different visual disturbances and distractors. We believe this work serves as a stepping-stone toward developing intelligent vision-based autonomous systems that control the drone purely from image inputs, like human pilots.

Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

TL;DR

Abstract

Paper Structure (12 sections, 3 equations, 6 figures, 3 tables)

This paper contains 12 sections, 3 equations, 6 figures, 3 tables.

Introduction
Related Work
Methodology
Policy Training
Robust Feature Learning via Data Augmentation
Experiments
Experimental Setup
Baseline Comparisons
Handling Visual Disturbances and Unseen Distractors
Aligning Image Embeddings
Handling Noisy State
Discussion and Conclusion

Figures (6)

Figure 1: Overview of our policy training method. We first train a teacher policy with access to privileged state information using model-free reinforcement learning. This teacher policy is then distilled into a student policy, which is trained to do perception, planning, and control jointly.
Figure 2: Contrastive learning framework grill2020bootstrap.
Figure 3: Visualization of data augmentations used during training. Left: no augmentation. Middle: random convolution. Right: random cutout-color.
Figure 4: Visualization of trajectories. Left: Circle. Middle: Figure8. Right: SplitS.
Figure 5: Success rates of the state-based policy over position drift.
...and 1 more figures

Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

TL;DR

Abstract

Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)