Table of Contents
Fetching ...

OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

Boce Hu, Xupeng Zhu, Dian Wang, Zihao Dong, Haojie Huang, Chenghao Wang, Robin Walters, Robert Platt

TL;DR

This paper proposes a novel framework for detecting grasp poses based on point cloud input that leverages a UNet-style encoder-decoder architecture to enlarge the number of points the model can handle and significantly outperforms baselines in both simulation and physical experiments.

Abstract

While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our main contribution is to propose an $SE(3)$-equivariant model that maps each point in the cloud to a continuous grasp quality function over the 2-sphere $S^2$ using spherical harmonic basis functions. Compared with reasoning about a finite set of samples, this formulation improves the accuracy and efficiency of our model when a large number of samples would otherwise be needed. In order to accomplish this, we propose a novel variation on EquiFormerV2 that leverages a UNet-style encoder-decoder architecture to enlarge the number of points the model can handle. Our resulting method, which we name $\textit{OrbitGrasp}$, significantly outperforms baselines in both simulation and physical experiments.

OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

TL;DR

This paper proposes a novel framework for detecting grasp poses based on point cloud input that leverages a UNet-style encoder-decoder architecture to enlarge the number of points the model can handle and significantly outperforms baselines in both simulation and physical experiments.

Abstract

While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting grasp poses based on point cloud input. Our main contribution is to propose an -equivariant model that maps each point in the cloud to a continuous grasp quality function over the 2-sphere using spherical harmonic basis functions. Compared with reasoning about a finite set of samples, this formulation improves the accuracy and efficiency of our model when a large number of samples would otherwise be needed. In order to accomplish this, we propose a novel variation on EquiFormerV2 that leverages a UNet-style encoder-decoder architecture to enlarge the number of points the model can handle. Our resulting method, which we name , significantly outperforms baselines in both simulation and physical experiments.
Paper Structure (22 sections, 2 equations, 11 figures, 5 tables)

This paper contains 22 sections, 2 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: We infer an orbit of grasps (yellow ellipse) defined relative to the surface normal (red arrow) at the contact point (pink dot). Since our model is equivariant over $\mathop{\mathrm{SO}}\nolimits(3)$, the optimal pose (represented by the solid gripper) on the orbit rotates consistently with the scene (left and right show a rotation by 90 degrees).
  • Figure 2: OrbitGrasp takes the point cloud $B_i$ (a neighborhood around center point $c_i$) as input and outputs a grasp quality function $f_p\colon S^2 \to \mathbb{R}$ for each point $p$ in $B_i$. Specifically, the model produces Fourier coefficients for each $p$ (represented as different channels in the network output), which are used to reconstruct $f_p$ based on SH, as in Equation \ref{['eqn:spherical_harmonics']}. The Orbit Pose Sampler generates multiple poses for each $p$, perpendicular to the surface normal $n_p$, and queries corresponding $f_p(\cdot)$ to evaluate these grasp qualities along the orbit. The grasp with the highest quality is then selected as the optimal grasp pose $a^*$, as shown on the right.
  • Figure 3: Green and blue denote the $y$, $z$ directions of the hand, and $n_p$ is the normal vector at $p$ (red). Black is the orbit of the approach direction.
  • Figure 4: Real world Experiment Setting. (a) Robot platform setup. (b) Upper: Packed object set with 10 objects. Bottom: Packed scene (c) Upper: Pile object set with 25 objects. Bottom: Pile scene
  • Figure 5: The mask-centric point cloud representation. Each mask is rendered in a distinct color. Points that belong to multiple masks are rendered with only one color.
  • ...and 6 more figures