SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation

Xun Tu; Karthik Desingh

SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation

Xun Tu, Karthik Desingh

TL;DR

SuperQ-GRASP introduces a geometric grasping pipeline that models large objects by reconstructing a mesh from multi-view RGB images with a NeRF-based approach and decomposing it into superquadrics. Grasp candidates are generated on each SQ via principled sampling and then validated for collision avoidance and near-antipodal stability, enabling proximal and reliable grasps for mobile manipulation. Across synthetic and real objects, the method demonstrates improved proximity and validity of grasps compared to baselines, with real-world experiments on Spot showing robustness to viewpoint variation, though pose estimation accuracy remains a key bottleneck. The work advances grasp planning for non-tabletop, high-genus objects by combining implicit modeling, primitive-based representation, and efficient grasp sampling into a cohesive pipeline with demonstrated practical impact for mobile manipulation tasks.

Abstract

Grasp planning and estimation have been a longstanding research problem in robotics, with two main approaches to find graspable poses on the objects: 1) geometric approach, which relies on 3D models of objects and the gripper to estimate valid grasp poses, and 2) data-driven, learning-based approach, with models trained to identify grasp poses from raw sensor observations. The latter assumes comprehensive geometric coverage during the training phase. However, the data-driven approach is typically biased toward tabletop scenarios and struggle to generalize to out-of-distribution scenarios with larger objects (e.g. chair). Additionally, raw sensor data (e.g. RGB-D data) from a single view of these larger objects is often incomplete and necessitates additional observations. In this paper, we take a geometric approach, leveraging advancements in object modeling (e.g. NeRF) to build an implicit model by taking RGB images from views around the target object. This model enables the extraction of explicit mesh model while also capturing the visual appearance from novel viewpoints that is useful for perception tasks like object detection and pose estimation. We further decompose the NeRF-reconstructed 3D mesh into superquadrics (SQs) -- parametric geometric primitives, each mapped to a set of precomputed grasp poses, allowing grasp composition on the target object based on these primitives. Our proposed pipeline overcomes the problems: a) noisy depth and incomplete view of the object, with a modeling step, and b) generalization to objects of any size. For more qualitative results, refer to the supplementary video and webpage https://bit.ly/3ZrOanU

SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation

TL;DR

Abstract

SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)