Table of Contents
Fetching ...

Is Image-based Object Pose Estimation Ready to Support Grasping?

Eric C. Joyce, Qianwen Zhao, Nathaniel Burgdorfer, Long Wang, Philippos Mordohai

TL;DR

The paper evaluates whether 6-DoF object poses estimated from a single RGB image can effectively guide robotic grasping. It introduces a physics-based evaluation framework that couples RGB-based pose estimators (DOPE, NCF, EPOS, ZebraPose, GDRNPP) with both a parallel gripper and an underactuated hand in MuJoCo, using ground-truth poses to generate reference grasps and measuring grasp success under an open-loop policy. Results show that improvements in pose accuracy generally boost grasp success for simpler shapes, but performance degrades with complex geometries, highlighting the critical roles of gripper design and object geometry. The study concludes that state-of-the-art RGB pose estimators are necessary but not sufficient; gripper selection and object type strongly influence practical grasping viability, guiding future integration of perception with manipulation for RGB-only setups.

Abstract

We present a framework for evaluating 6-DoF instance-level object pose estimators, focusing on those that require a single RGB (not RGB-D) image as input. Besides gaining intuition about how accurate these estimators are, we are interested in the degree to which they can serve as the sole perception mechanism for robotic grasping. To assess this, we perform grasping trials in a physics-based simulator, using image-based pose estimates to guide a parallel gripper and an underactuated robotic hand in picking up 3D models of objects. Our experiments on a subset of the BOP (Benchmark for 6D Object Pose Estimation) dataset compare five open-source object pose estimators and provide insights that were missing from the literature.

Is Image-based Object Pose Estimation Ready to Support Grasping?

TL;DR

The paper evaluates whether 6-DoF object poses estimated from a single RGB image can effectively guide robotic grasping. It introduces a physics-based evaluation framework that couples RGB-based pose estimators (DOPE, NCF, EPOS, ZebraPose, GDRNPP) with both a parallel gripper and an underactuated hand in MuJoCo, using ground-truth poses to generate reference grasps and measuring grasp success under an open-loop policy. Results show that improvements in pose accuracy generally boost grasp success for simpler shapes, but performance degrades with complex geometries, highlighting the critical roles of gripper design and object geometry. The study concludes that state-of-the-art RGB pose estimators are necessary but not sufficient; gripper selection and object type strongly influence practical grasping viability, guiding future integration of perception with manipulation for RGB-only setups.

Abstract

We present a framework for evaluating 6-DoF instance-level object pose estimators, focusing on those that require a single RGB (not RGB-D) image as input. Besides gaining intuition about how accurate these estimators are, we are interested in the degree to which they can serve as the sole perception mechanism for robotic grasping. To assess this, we perform grasping trials in a physics-based simulator, using image-based pose estimates to guide a parallel gripper and an underactuated robotic hand in picking up 3D models of objects. Our experiments on a subset of the BOP (Benchmark for 6D Object Pose Estimation) dataset compare five open-source object pose estimators and provide insights that were missing from the literature.

Paper Structure

This paper contains 18 sections, 8 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Pose estimates as green overlays and their corresponding ground-truth poses as solid objects. All estimates here measured better than average ADD(-S) and MSSD and yet exhibit significant rotation and translation errors. Our trials attempt to grasp according to the estimated poses, and all estimates seen here were poor enough to cause grasping trial failures.
  • Figure 2: Breakdown of different stages of a simulated grasping task using a simplified open-loop control policy.
  • Figure 3: Example reference grasps for selected objects in the LM-O dataset (a-h) and YCB-V dataset (i-o). All grasping trials are attempted with both the parallel gripper and the underactuated hand.
  • Figure 4: Cumulative distribution curves for grasp failure rate as a function of our four metrics. These curves average together all objects, for all estimators. Dashed lines are for the parallel gripper, while solid lines are for the underactuated hand. The metric with least area under its curve is the strongest predictor for grasp success. Here we see the overall superiority of the underactuated hand, the pronounced tolerance to rotation error, and the correlation between translation error and the two BOP metrics.