Is Image-based Object Pose Estimation Ready to Support Grasping?
Eric C. Joyce, Qianwen Zhao, Nathaniel Burgdorfer, Long Wang, Philippos Mordohai
TL;DR
The paper evaluates whether 6-DoF object poses estimated from a single RGB image can effectively guide robotic grasping. It introduces a physics-based evaluation framework that couples RGB-based pose estimators (DOPE, NCF, EPOS, ZebraPose, GDRNPP) with both a parallel gripper and an underactuated hand in MuJoCo, using ground-truth poses to generate reference grasps and measuring grasp success under an open-loop policy. Results show that improvements in pose accuracy generally boost grasp success for simpler shapes, but performance degrades with complex geometries, highlighting the critical roles of gripper design and object geometry. The study concludes that state-of-the-art RGB pose estimators are necessary but not sufficient; gripper selection and object type strongly influence practical grasping viability, guiding future integration of perception with manipulation for RGB-only setups.
Abstract
We present a framework for evaluating 6-DoF instance-level object pose estimators, focusing on those that require a single RGB (not RGB-D) image as input. Besides gaining intuition about how accurate these estimators are, we are interested in the degree to which they can serve as the sole perception mechanism for robotic grasping. To assess this, we perform grasping trials in a physics-based simulator, using image-based pose estimates to guide a parallel gripper and an underactuated robotic hand in picking up 3D models of objects. Our experiments on a subset of the BOP (Benchmark for 6D Object Pose Estimation) dataset compare five open-source object pose estimators and provide insights that were missing from the literature.
