Table of Contents
Fetching ...

Pyramid-Monozone Synergistic Grasping Policy in Dense Clutter

Chenghao Li, Nak Young Chong

TL;DR

This work proposes the Pyramid-Monozone Synergistic Grasping Policy (PMSGP) that enables robots to effectively handle occlusions during grasping and performs more than 7,000 real-world grasping in densely cluttered scenes, demonstrating that PMSGP significantly outperforms seven competitive grasping methods.

Abstract

Grasping a diverse range of novel objects in dense clutter poses a great challenge to robotic automation mainly due to the occlusion problem. In this work, we propose the Pyramid-Monozone Synergistic Grasping Policy (PMSGP) that enables robots to effectively handle occlusions during grasping. Specifically, we initially construct the Pyramid Sequencing Policy (PSP) to sequence each object in cluttered scenes into a pyramid structure. By isolating objects layer-by-layer, the grasp detection model is allowed to focus on a single layer during each grasp. Then, we devise the Monozone Sampling Policy (MSP) to sample the grasp candidates in the top layer. Through this manner, each grasp targets the topmost object, thereby effectively avoiding most occlusions. We performed more than 7,000 real-world grasping in densely cluttered scenes with 300 novel objects, demonstrating that PMSGP significantly outperforms seven competitive grasping methods. More importantly, we tested the grasping performance of PMSGP in extremely cluttered scenes involving 100 different household goods, and found that PMSGP pushed the grasp success rate to 84.9\%. To the best of our knowledge, no previous work has demonstrated similar performance. All grasping videos are available at: https://www.youtube.com/@chenghaoli4532/playlists.

Pyramid-Monozone Synergistic Grasping Policy in Dense Clutter

TL;DR

This work proposes the Pyramid-Monozone Synergistic Grasping Policy (PMSGP) that enables robots to effectively handle occlusions during grasping and performs more than 7,000 real-world grasping in densely cluttered scenes, demonstrating that PMSGP significantly outperforms seven competitive grasping methods.

Abstract

Grasping a diverse range of novel objects in dense clutter poses a great challenge to robotic automation mainly due to the occlusion problem. In this work, we propose the Pyramid-Monozone Synergistic Grasping Policy (PMSGP) that enables robots to effectively handle occlusions during grasping. Specifically, we initially construct the Pyramid Sequencing Policy (PSP) to sequence each object in cluttered scenes into a pyramid structure. By isolating objects layer-by-layer, the grasp detection model is allowed to focus on a single layer during each grasp. Then, we devise the Monozone Sampling Policy (MSP) to sample the grasp candidates in the top layer. Through this manner, each grasp targets the topmost object, thereby effectively avoiding most occlusions. We performed more than 7,000 real-world grasping in densely cluttered scenes with 300 novel objects, demonstrating that PMSGP significantly outperforms seven competitive grasping methods. More importantly, we tested the grasping performance of PMSGP in extremely cluttered scenes involving 100 different household goods, and found that PMSGP pushed the grasp success rate to 84.9\%. To the best of our knowledge, no previous work has demonstrated similar performance. All grasping videos are available at: https://www.youtube.com/@chenghaoli4532/playlists.
Paper Structure (25 sections, 12 equations, 10 figures, 5 tables, 2 algorithms)

This paper contains 25 sections, 12 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: Grasping in dense clutter: Objects are occluded to one another (highlighted in green lines) and partially visible, which can easily lead to failed grasping.
  • Figure 2: Pipeline of PMSGP: Firstly, conducting Top View Alignment (TVA) to center the initial view $V$ of depth camera on the topmost object to get view $V"'$, and segment this topmost object by the center $V"'(c)$ (green point) of this view as prompt to obtain initial segmented RGB image (emphasized with green lines) with mask $M_f$. Then, calculate two pairs of most distant points ($p_m$ (red point), $p_m'$ (red point), ${p_{m_p}}$ (blue point), and ${p_{m_p}'}$ (blue point)) based on the edge of $M_f$, and using these points to make Cross-prompted Segmentation (CPS) to optimize $M_f$ to get ${M_r}$. In step three, the segmented RGB image with mask ${M_r}$ and the depth image within view $V"'$ are fed into the grasp detection (GD) model to generate initial grasp candidates $G$, followed by Grasp Angle Calibration (GAC) to obtain calibrated grasp candidates $G'$. After GAC, $G'$ will be through Monozone Grasp Analysis (MGA) to find the optimal grasp $g^*$. Finally, $g^*$ is optimized by Optimal Grasp Refinement (OGR) to transfer it to the final grasp $g^*_f$.
  • Figure 3: Sequence of top view alignment: (a) global view alignment, (b) first local view alignment, (c) second local view alignment, and (d) final aligned view. $V(c)$, $V'(c)$, $V"(c)$, and $V"'(c)$ are denoted by red point, green point, blue point, and purple point.
  • Figure 4: Visualization of (a) cross-prompted segmentation and (b) grasp angle calibration. In (a), $p_{m_p}$ and $p_{m_p}'$ are denoted by orange points, $p_{m}$ and $p_{m}'$ by green point. In (b), $p_{t_l}$, $p_{b_l}$, $p_{t_r}$, and $p_{b_r}$ by orange points, $\mathbf{v}_{p_l}$ and $\mathbf{v}_{p_r}$ by orange lines, $\mathbf{v}_{g_u}$ by green line, and $\theta'$ and $\theta"$ by white arc.
  • Figure 5: Visualization of adaptive viewpoint rotation in MGA. The red point, green point, and blue point are rotation center $c_r$, the center $c_i'$ of $g_i'$, and the center $c_i$ of $g_i$, respectively. The rotation angle is denoted as $\theta_c$.
  • ...and 5 more figures