Learning a visuomotor controller for real world robotic grasping using simulated depth images

Ulrich Viereck; Andreas ten Pas; Kate Saenko; Robert Platt

Learning a visuomotor controller for real world robotic grasping using simulated depth images

Ulrich Viereck, Andreas ten Pas, Kate Saenko, Robert Platt

TL;DR

This work introduces a closed-loop visuomotor controller for robotic grasping that uses depth images from a wrist-mounted sensor and a CNN that predicts distance-to-nearest-grasp. Training data are generated entirely in simulation via OpenRAVE, mapping depth-action pairs to an L1-optimized distance function, which the controller uses to iteratively approach a grasp. The approach transfers well to real sensors and outperforms a strong one-shot baseline under kinematic noise and perceptual disturbances, with notable gains in dynamic scenes. The results demonstrate the practicality of sim-to-real, depth-based, feedback-controlled grasping in cluttered and shifting environments, while suggesting avenues for faster corrections and deployment on noisier hardware.

Abstract

We want to build robots that are useful in unstructured real world applications, such as doing work in the household. Grasping in particular is an important skill in this domain, yet it remains a challenge. One of the key hurdles is handling unexpected changes or motion in the objects being grasped and kinematic noise or other errors in the robot. This paper proposes an approach to learning a closed-loop controller for robotic grasping that dynamically guides the gripper to the object. We use a wrist-mounted sensor to acquire depth images in front of the gripper and train a convolutional neural network to learn a distance function to true grasps for grasp configurations over an image. The training sensor data is generated in simulation, a major advantage over previous work that uses real robot experience, which is costly to obtain. Despite being trained in simulation, our approach works well on real noisy sensor images. We compare our controller in simulated and real robot experiments to a strong baseline for grasp pose detection, and find that our approach significantly outperforms the baseline in the presence of kinematic noise, perceptual errors and disturbances of the object during grasping.

Learning a visuomotor controller for real world robotic grasping using simulated depth images

TL;DR

Abstract

Learning a visuomotor controller for real world robotic grasping using simulated depth images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)