6-DoF Grasp Planning using Fast 3D Reconstruction and Grasp Quality CNN
Yahav Avigal, Samuel Paradis, Harry Zhang
TL;DR
This work tackles affordable 6-DoF grasp planning for home robotics by leveraging inexpensive RGB cameras and learned multi-view depth reconstruction. It retrains the Learnt Stereo Machine (LSM) on graspable objects to produce depth maps from multiple views and combines these with a Multi-View GQ-CNN (MV-GQ-CNN) to plan robust 6-DoF grasps across viewpoints. Key contributions include a synthetic data generation pipeline for LSM retraining, an MV-GQ-CNN architecture adapted to varying camera viewpoints, and an evaluation showing feasible depth-based grasp planning with LSM-produced maps. The approach promises practical, low-cost 6-DoF grasp planning suitable for cluttered home environments and paves the way for real-robot validation and clutter-aware extensions.
Abstract
Recent consumer demand for home robots has accelerated performance of robotic grasping. However, a key component of the perception pipeline, the depth camera, is still expensive and inaccessible to most consumers. In addition, grasp planning has significantly improved recently, by leveraging large datasets and cloud robotics, and by limiting the state and action space to top-down grasps with 4 degrees of freedom (DoF). By leveraging multi-view geometry of the object using inexpensive equipment such as off-the-shelf RGB cameras and state-of-the-art algorithms such as Learn Stereo Machine (LSM\cite{kar2017learning}), the robot is able to generate more robust grasps from different angles with 6-DoF. In this paper, we present a modification of LSM to graspable objects, evaluate the grasps, and develop a 6-DoF grasp planner based on Grasp-Quality CNN (GQ-CNN\cite{mahler2017dex}) that exploits multiple camera views to plan a robust grasp, even in the absence of a possible top-down grasp.
