Table of Contents
Fetching ...

Benchmarking the Effects of Object Pose Estimation and Reconstruction on Robotic Grasping Success

Varun Burde, Pavel Burget, Torsten Sattler

TL;DR

A large-scale, physics-based benchmark that evaluates 6D pose estimators and 3D mesh models based on their functional efficacy in grasping provides insight into how perception systems relate to object manipulation using robots.

Abstract

3D reconstruction serves as the foundational layer for numerous robotic perception tasks, including 6D object pose estimation and grasp pose generation. Modern 3D reconstruction methods for objects can produce visually and geometrically impressive meshes from multi-view images, yet standard geometric evaluations do not reflect how reconstruction quality influences downstream tasks such as robotic manipulation performance. This paper addresses this gap by introducing a large-scale, physics-based benchmark that evaluates 6D pose estimators and 3D mesh models based on their functional efficacy in grasping. We analyze the impact of model fidelity by generating grasps on various reconstructed 3D meshes and executing them on the ground-truth model, simulating how grasp poses generated with an imperfect model affect interaction with the real object. This assesses the combined impact of pose error, grasp robustness, and geometric inaccuracies from 3D reconstruction. Our results show that reconstruction artifacts significantly decrease the number of grasp pose candidates but have a negligible effect on grasping performance given an accurately estimated pose. Our results also reveal that the relationship between grasp success and pose error is dominated by spatial error, and even a simple translation error provides insight into the success of the grasping pose of symmetric objects. This work provides insight into how perception systems relate to object manipulation using robots.

Benchmarking the Effects of Object Pose Estimation and Reconstruction on Robotic Grasping Success

TL;DR

A large-scale, physics-based benchmark that evaluates 6D pose estimators and 3D mesh models based on their functional efficacy in grasping provides insight into how perception systems relate to object manipulation using robots.

Abstract

3D reconstruction serves as the foundational layer for numerous robotic perception tasks, including 6D object pose estimation and grasp pose generation. Modern 3D reconstruction methods for objects can produce visually and geometrically impressive meshes from multi-view images, yet standard geometric evaluations do not reflect how reconstruction quality influences downstream tasks such as robotic manipulation performance. This paper addresses this gap by introducing a large-scale, physics-based benchmark that evaluates 6D pose estimators and 3D mesh models based on their functional efficacy in grasping. We analyze the impact of model fidelity by generating grasps on various reconstructed 3D meshes and executing them on the ground-truth model, simulating how grasp poses generated with an imperfect model affect interaction with the real object. This assesses the combined impact of pose error, grasp robustness, and geometric inaccuracies from 3D reconstruction. Our results show that reconstruction artifacts significantly decrease the number of grasp pose candidates but have a negligible effect on grasping performance given an accurately estimated pose. Our results also reveal that the relationship between grasp success and pose error is dominated by spatial error, and even a simple translation error provides insight into the success of the grasping pose of symmetric objects. This work provides insight into how perception systems relate to object manipulation using robots.
Paper Structure (26 sections, 4 equations, 5 figures)

This paper contains 26 sections, 4 equations, 5 figures.

Figures (5)

  • Figure 1: Overview of our evaluation pipeline. First, a canonical grasp library is pre-computed for each object. Then, for a given scene, a pose estimator provides $T_{c2o}^{est}$. This pose is used to derive a target gripper pose, $T_{w2g}^{est}$, which is executed on the ground-truth object. The outcome is recorded to calculate the Estimated Success Rate ($S_{est}$) (Sec. \ref{['sec:metric_sest']}) and correlated with the initial pose error.
  • Figure 2: Baseline gripper performance analysis, visualizing the Grasp Generation Success Rate ($S_{gen}$) (Sec. \ref{['sec:metric_sgen']}) across various grippers and objects under ideal conditions. (a) Per-object $S_{gen}$ for each gripper. (b) Distribution of the best-performing gripper for each object. (c) Average $S_{gen}$ per gripper. (d) A Physics-Based Outcome Breakdown (Sec. \ref{['sec:metric_breakdown']}) of grasp failures for each gripper with gripper jaw widths annotated in millimeters.
  • Figure 3: Analysis of Grasping Performance vs. Pose Estimation Error.Left Panel: Scatter plots showing the relation between various pose error metrics and the Estimated Success Rate ($S_{est}$), averaged over both FoundationPose and MegaPose across 8,250 trials and 18,882,842 simulations (Sec.\ref{['sec:metric_sest']}). Right Panel: A detailed Physics-Based Outcome Breakdown (Sec.\ref{['sec:metric_breakdown']}) of grasp attempts per object. The green portion of each bar represents the final $S_{est}$, while other colors show the proportions of different failure modes.
  • Figure 4: Impact of 3D Model Fidelity on Grasp Candidates.Left panel: The Grasp Generation Success Rate ($S_{gen}$) (Sec. \ref{['sec:metric_sgen']}) for various reconstruction methods. Right panel: A Physics-Based Outcome Breakdown (Sec. \ref{['sec:metric_breakdown']}) for grasps planned on these meshes. Note the significant increase in 'Collision' failures for lower-quality models.
  • Figure 5: Comparative Analysis of Grasping Success under Compounded Errors. This figure compares the final grasping success, measured by the Estimated Success Rate ($S_{est}$) (Sec. \ref{['sec:metric_sest']}), when combining different sources of geometric and pose uncertainty. Left: Performance under the 'GT $\to$ Reconstructed mesh' condition. Right: Performance under the 'Reconstructed mesh $\to$ GT' condition.