Table of Contents
Fetching ...

MG-Grasp: Metric-Scale Geometric 6-DoF Grasping Framework with Sparse RGB Observations

Kangxu Wang, Siang Chen, Chenxing Jiang, Shaojie Shen, Yixiang Dai, Guijin Wang

Abstract

Single-view RGB-D grasp detection remains a common choice in 6-DoF robotic grasping systems, which typically requires a depth sensor. While RGB-only 6-DoF grasp methods has been studied recently, their inaccurate geometric representation is not directly suitable for physically reliable robotic manipulation, thereby hindering reliable grasp generation. To address these limitations, we propose MG-Grasp, a novel depth-free 6-DoF grasping framework that achieves high-quality object grasping. Leveraging two-view 3D foundation model with camera intrinsic/extrinsic, our method reconstructs metric-scale and multi-view consistent dense point clouds from sparse RGB images and generates stable 6-DoF grasp. Experiments on GraspNet-1Billion dataset and real world demonstrate that MG-Grasp achieves state-of-the-art (SOTA) grasp performance among RGB-based 6-DoF grasping methods.

MG-Grasp: Metric-Scale Geometric 6-DoF Grasping Framework with Sparse RGB Observations

Abstract

Single-view RGB-D grasp detection remains a common choice in 6-DoF robotic grasping systems, which typically requires a depth sensor. While RGB-only 6-DoF grasp methods has been studied recently, their inaccurate geometric representation is not directly suitable for physically reliable robotic manipulation, thereby hindering reliable grasp generation. To address these limitations, we propose MG-Grasp, a novel depth-free 6-DoF grasping framework that achieves high-quality object grasping. Leveraging two-view 3D foundation model with camera intrinsic/extrinsic, our method reconstructs metric-scale and multi-view consistent dense point clouds from sparse RGB images and generates stable 6-DoF grasp. Experiments on GraspNet-1Billion dataset and real world demonstrate that MG-Grasp achieves state-of-the-art (SOTA) grasp performance among RGB-based 6-DoF grasping methods.
Paper Structure (18 sections, 15 equations, 5 figures, 6 tables)

This paper contains 18 sections, 15 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Pipeline of MG-Grasp: given sparse posed RGB images, we first perform Depth Aggregation and Dense Correspondence to obtain up-to-scale pointmaps, and then recover metric scale via Triangulation-based Scale Recovery. The metric pointmaps are further refined with multi-view consistency optimization to produce consistent geometry, which is finally used for grasp guidance and 6-DoF grasp generation.
  • Figure 2: Left: metrically scaled pointmaps before refinement; right: refined pointmaps. The highlighted regions (red boxes) illustrate misaligned surface layers before refinement, which are largely removed after refinement, resulting in multi-view consistent geometry for grasping.
  • Figure 3: Comparison of unprojected point clouds produced by different geometry pipelines. The highlighted regions illustrate misaligned surface.
  • Figure 4: Effect of the number of input views on GraspNet grasping performance (AP).
  • Figure 5: Real world setting.