OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

Boce Hu; Xupeng Zhu; Dian Wang; Zihao Dong; Haojie Huang; Chenghao Wang; Robin Walters; Robert Platt

OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

Boce Hu, Xupeng Zhu, Dian Wang, Zihao Dong, Haojie Huang, Chenghao Wang, Robin Walters, Robert Platt

TL;DR

This paper proposes a novel framework for detecting grasp poses based on point cloud input that leverages a UNet-style encoder-decoder architecture to enlarge the number of points the model can handle and significantly outperforms baselines in both simulation and physical experiments.

Abstract

While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our main contribution is to propose an $SE(3)$-equivariant model that maps each point in the cloud to a continuous grasp quality function over the 2-sphere $S^2$ using spherical harmonic basis functions. Compared with reasoning about a finite set of samples, this formulation improves the accuracy and efficiency of our model when a large number of samples would otherwise be needed. In order to accomplish this, we propose a novel variation on EquiFormerV2 that leverages a UNet-style encoder-decoder architecture to enlarge the number of points the model can handle. Our resulting method, which we name $\textit{OrbitGrasp}$, significantly outperforms baselines in both simulation and physical experiments.

OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

TL;DR

Abstract

While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in

remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting

grasp poses based on point cloud input. Our main contribution is to propose an

-equivariant model that maps each point in the cloud to a continuous grasp quality function over the 2-sphere

using spherical harmonic basis functions. Compared with reasoning about a finite set of samples, this formulation improves the accuracy and efficiency of our model when a large number of samples would otherwise be needed. In order to accomplish this, we propose a novel variation on EquiFormerV2 that leverages a UNet-style encoder-decoder architecture to enlarge the number of points the model can handle. Our resulting method, which we name

, significantly outperforms baselines in both simulation and physical experiments.

Paper Structure (22 sections, 2 equations, 11 figures, 5 tables)

This paper contains 22 sections, 2 equations, 11 figures, 5 tables.

Introduction
Related Work
Background
Method
Experiments
Comparison With Baseline Methods in Simulation
Physical Experiments
Ablation Study
Conclusion and Limitations
Mask-Based Sample Generation
Model Architecture
Additional Details in Inferring Grasp Pose
Simulation Additional Details
Efficiency and Inference Time Analysis
Training Data Efficiency.
...and 7 more sections

Figures (11)

Figure 1: We infer an orbit of grasps (yellow ellipse) defined relative to the surface normal (red arrow) at the contact point (pink dot). Since our model is equivariant over $\mathop{\mathrm{SO}}\nolimits(3)$, the optimal pose (represented by the solid gripper) on the orbit rotates consistently with the scene (left and right show a rotation by 90 degrees).
Figure 2: OrbitGrasp takes the point cloud $B_i$ (a neighborhood around center point $c_i$) as input and outputs a grasp quality function $f_p\colon S^2 \to \mathbb{R}$ for each point $p$ in $B_i$. Specifically, the model produces Fourier coefficients for each $p$ (represented as different channels in the network output), which are used to reconstruct $f_p$ based on SH, as in Equation \ref{['eqn:spherical_harmonics']}. The Orbit Pose Sampler generates multiple poses for each $p$, perpendicular to the surface normal $n_p$, and queries corresponding $f_p(\cdot)$ to evaluate these grasp qualities along the orbit. The grasp with the highest quality is then selected as the optimal grasp pose $a^*$, as shown on the right.
Figure 3: Green and blue denote the $y$, $z$ directions of the hand, and $n_p$ is the normal vector at $p$ (red). Black is the orbit of the approach direction.
Figure 4: Real world Experiment Setting. (a) Robot platform setup. (b) Upper: Packed object set with 10 objects. Bottom: Packed scene (c) Upper: Pile object set with 25 objects. Bottom: Pile scene
Figure 5: The mask-centric point cloud representation. Each mask is rendered in a distinct color. Points that belong to multiple masks are rendered with only one color.
...and 6 more figures

OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

TL;DR

Abstract

OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)