Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight
Katie Kang, Suneel Belkhale, Gregory Kahn, Pieter Abbeel, Sergey Levine
TL;DR
The paper tackles the challenge of generalizing vision-based autonomous flight policies from limited real-world data by combining large-scale simulated data with sparse real-world experiences. It trains a task-specific perception module in simulation via a Q-function and transfers its CNN layers to a real-world reward-predictor model used in a receding-horizon MPC framework, with the perception layers held fixed to avoid overfitting. Real-world data provides accurate dynamics, while simulation provides diverse visual features that generalize to new environments; this leads to robust collision avoidance on a nano aerial vehicle across varied hallways and lighting. The approach outperforms several baselines, including sim-only, sim-finetuned, and ImageNet/unsupervised transfer methods, demonstrating the value of separating perception learning from dynamics and leveraging task-specific simulation for transfer.
Abstract
Deep reinforcement learning provides a promising approach for vision-based control of real-world robots. However, the generalization of such models depends critically on the quantity and variety of data available for training. This data can be difficult to obtain for some types of robotic systems, such as fragile, small-scale quadrotors. Simulated rendering and physics can provide for much larger datasets, but such data is inherently of lower quality: many of the phenomena that make the real-world autonomous flight problem challenging, such as complex physics and air currents, are modeled poorly or not at all, and the systematic differences between simulation and the real world are typically impossible to eliminate. In this work, we investigate how data from both simulation and the real world can be combined in a hybrid deep reinforcement learning algorithm. Our method uses real-world data to learn about the dynamics of the system, and simulated data to learn a generalizable perception system that can enable the robot to avoid collisions using only a monocular camera. We demonstrate our approach on a real-world nano aerial vehicle collision avoidance task, showing that with only an hour of real-world data, the quadrotor can avoid collisions in new environments with various lighting conditions and geometry. Code, instructions for building the aerial vehicles, and videos of the experiments can be found at github.com/gkahn13/GtS
