Table of Contents
Fetching ...

Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight

Katie Kang, Suneel Belkhale, Gregory Kahn, Pieter Abbeel, Sergey Levine

TL;DR

The paper tackles the challenge of generalizing vision-based autonomous flight policies from limited real-world data by combining large-scale simulated data with sparse real-world experiences. It trains a task-specific perception module in simulation via a Q-function and transfers its CNN layers to a real-world reward-predictor model used in a receding-horizon MPC framework, with the perception layers held fixed to avoid overfitting. Real-world data provides accurate dynamics, while simulation provides diverse visual features that generalize to new environments; this leads to robust collision avoidance on a nano aerial vehicle across varied hallways and lighting. The approach outperforms several baselines, including sim-only, sim-finetuned, and ImageNet/unsupervised transfer methods, demonstrating the value of separating perception learning from dynamics and leveraging task-specific simulation for transfer.

Abstract

Deep reinforcement learning provides a promising approach for vision-based control of real-world robots. However, the generalization of such models depends critically on the quantity and variety of data available for training. This data can be difficult to obtain for some types of robotic systems, such as fragile, small-scale quadrotors. Simulated rendering and physics can provide for much larger datasets, but such data is inherently of lower quality: many of the phenomena that make the real-world autonomous flight problem challenging, such as complex physics and air currents, are modeled poorly or not at all, and the systematic differences between simulation and the real world are typically impossible to eliminate. In this work, we investigate how data from both simulation and the real world can be combined in a hybrid deep reinforcement learning algorithm. Our method uses real-world data to learn about the dynamics of the system, and simulated data to learn a generalizable perception system that can enable the robot to avoid collisions using only a monocular camera. We demonstrate our approach on a real-world nano aerial vehicle collision avoidance task, showing that with only an hour of real-world data, the quadrotor can avoid collisions in new environments with various lighting conditions and geometry. Code, instructions for building the aerial vehicles, and videos of the experiments can be found at github.com/gkahn13/GtS

Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight

TL;DR

The paper tackles the challenge of generalizing vision-based autonomous flight policies from limited real-world data by combining large-scale simulated data with sparse real-world experiences. It trains a task-specific perception module in simulation via a Q-function and transfers its CNN layers to a real-world reward-predictor model used in a receding-horizon MPC framework, with the perception layers held fixed to avoid overfitting. Real-world data provides accurate dynamics, while simulation provides diverse visual features that generalize to new environments; this leads to robust collision avoidance on a nano aerial vehicle across varied hallways and lighting. The approach outperforms several baselines, including sim-only, sim-finetuned, and ImageNet/unsupervised transfer methods, demonstrating the value of separating perception learning from dynamics and leveraging task-specific simulation for transfer.

Abstract

Deep reinforcement learning provides a promising approach for vision-based control of real-world robots. However, the generalization of such models depends critically on the quantity and variety of data available for training. This data can be difficult to obtain for some types of robotic systems, such as fragile, small-scale quadrotors. Simulated rendering and physics can provide for much larger datasets, but such data is inherently of lower quality: many of the phenomena that make the real-world autonomous flight problem challenging, such as complex physics and air currents, are modeled poorly or not at all, and the systematic differences between simulation and the real world are typically impossible to eliminate. In this work, we investigate how data from both simulation and the real world can be combined in a hybrid deep reinforcement learning algorithm. Our method uses real-world data to learn about the dynamics of the system, and simulated data to learn a generalizable perception system that can enable the robot to avoid collisions using only a monocular camera. We demonstrate our approach on a real-world nano aerial vehicle collision avoidance task, showing that with only an hour of real-world data, the quadrotor can avoid collisions in new environments with various lighting conditions and geometry. Code, instructions for building the aerial vehicles, and videos of the experiments can be found at github.com/gkahn13/GtS

Paper Structure

This paper contains 10 sections, 4 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Our autonomous quadrotor navigating a building from raw monocular images using a learned collision avoidance policy trained with a simulator and one hour of real-world data.
  • Figure 2: Our approach for leveraging both a simulator and real-world data. In simulation, we run reinforcement learning in order to learn a task-specific deep neural network Q-function model. Using real-world data from running the robot, we learn a deep neural network model that predicts future rewards given the current state and a future sequence of actions; this model can be used to form a control policy by selecting actions that maximize future rewards. In order to learn a generalizable reward prediction model with only an hours worth of real-world data, we transfer the perception neural network layers from the Q-function trained in simulation to be the perception module for the reward predictor. Our experiments demonstrate that (1) fine-tuning the Q-function on real-world data does not lead to good performance, (2) the reward predictor is better suited for real-world learning due to the limited amount of real-world data, and (3) learning a task-specific model in simulation improves transfer of the perception module.
  • Figure 3: A subset of the environments used for simulation training.
  • Figure 4: Our learning-based approach, using only the onboard, grayscale, $72 \times 96$ resolution camera images, flying through a straight, curved, and zig-zag hallway.
  • Figure 5: Example failure: collision with a glass door.