CenterArt: Joint Shape Reconstruction and 6-DoF Grasp Estimation of Articulated Objects
Sassan Mokhtar, Eugenio Chisari, Nick Heppert, Abhinav Valada
TL;DR
Addresses joint 3D shape reconstruction and 6-DoF grasp estimation for articulated objects from RGB-D. It introduces CenterArt, a vision-based architecture with an image encoder and a SGDF-based decoder that uses shape and joint latent codes to predict geometry and 6-DoF grasps, transforming results into the camera frame. A two-pronged dataset generation strategy builds a large set of valid 6-DoF grasps from PartNet-Mobility objects and realistic Sapien kitchen scenes. Empirical results show CenterArt outperforming the RL-based baseline (UMPNet) by up to $52\%$ SR on simple scenes and demonstrates robustness to depth noise and scene complexity, with an overall improvement of about $28\%$ in SR across tested scenarios.
Abstract
Precisely grasping and reconstructing articulated objects is key to enabling general robotic manipulation. In this paper, we propose CenterArt, a novel approach for simultaneous 3D shape reconstruction and 6-DoF grasp estimation of articulated objects. CenterArt takes RGB-D images of the scene as input and first predicts the shape and joint codes through an encoder. The decoder then leverages these codes to reconstruct 3D shapes and estimate 6-DoF grasp poses of the objects. We further develop a mechanism for generating a dataset of 6-DoF grasp ground truth poses for articulated objects. CenterArt is trained on realistic scenes containing multiple articulated objects with randomized designs, textures, lighting conditions, and realistic depths. We perform extensive experiments demonstrating that CenterArt outperforms existing methods in accuracy and robustness.
