AO-Grasp: Articulated Object Grasp Generation

Carlota Parés Morlans; Claire Chen; Yijia Weng; Michelle Yi; Yuying Huang; Nick Heppert; Linqi Zhou; Leonidas Guibas; Jeannette Bohg

AO-Grasp: Articulated Object Grasp Generation

Carlota Parés Morlans, Claire Chen, Yijia Weng, Michelle Yi, Yuying Huang, Nick Heppert, Linqi Zhou, Leonidas Guibas, Jeannette Bohg

TL;DR

AO-Grasp addresses robustly grasping articulated objects by predicting actionable, stable $6$-DoF grasps directly from partial point clouds. It introduces the AO-Grasp Dataset (78K grasps on 84 articulated instances across 7 categories) and the AO-Grasp Model, which combines an Actionable Grasp Point Predictor with a CGN-based orientation module to generate grasp proposals without part segmentation. The method achieves a simulated average grasp success of $45 ext{%}$ and zero-shot sim-to-real transfer, with real-world success at $67.5 ext{%}$ across 120 scenes, outperforming baselines. These results demonstrate the practicality of learning grasp strategies for articulated objects in both simulation and real-world settings, while leaving room for improvements in orientation learning and object diversity.

Abstract

We introduce AO-Grasp, a grasp proposal method that generates 6 DoF grasps that enable robots to interact with articulated objects, such as opening and closing cabinets and appliances. AO-Grasp consists of two main contributions: the AO-Grasp Model and the AO-Grasp Dataset. Given a segmented partial point cloud of a single articulated object, the AO-Grasp Model predicts the best grasp points on the object with an Actionable Grasp Point Predictor. Then, it finds corresponding grasp orientations for each of these points, resulting in stable and actionable grasp proposals. We train the AO-Grasp Model on our new AO-Grasp Dataset, which contains 78K actionable parallel-jaw grasps on synthetic articulated objects. In simulation, AO-Grasp achieves a 45.0 % grasp success rate, whereas the highest performing baseline achieves a 35.0% success rate. Additionally, we evaluate AO-Grasp on 120 real-world scenes of objects with varied geometries, articulation axes, and joint states, where AO-Grasp produces successful grasps on 67.5% of scenes, while the baseline only produces successful grasps on 33.3% of scenes. To the best of our knowledge, AO-Grasp is the first method for generating 6 DoF grasps on articulated objects directly from partial point clouds without requiring part detection or hand-designed grasp heuristics. Project website: https://stanford-iprl-lab.github.io/ao-grasp

AO-Grasp: Articulated Object Grasp Generation

TL;DR

AO-Grasp addresses robustly grasping articulated objects by predicting actionable, stable

-DoF grasps directly from partial point clouds. It introduces the AO-Grasp Dataset (78K grasps on 84 articulated instances across 7 categories) and the AO-Grasp Model, which combines an Actionable Grasp Point Predictor with a CGN-based orientation module to generate grasp proposals without part segmentation. The method achieves a simulated average grasp success of

and zero-shot sim-to-real transfer, with real-world success at

across 120 scenes, outperforming baselines. These results demonstrate the practicality of learning grasp strategies for articulated objects in both simulation and real-world settings, while leaving room for improvements in orientation learning and object diversity.

Abstract

Paper Structure (12 sections, 3 equations, 4 figures, 5 tables)

This paper contains 12 sections, 3 equations, 4 figures, 5 tables.

Introduction
Related work
AO-Grasp Dataset
Grasp parametrization and labeling criteria
Grasp sampling
AO-Grasp Model
Actionable Grasp Point Predictor
Predicting grasp orientations
Experimental results
Simulation evaluation
Real-world evaluation
Conclusion

Figures (4)

Figure 1: AO-Grasp consists of (a) the AO-Grasp Dataset, which contains 78K actionable grasps on synthetic articulated objects, and (b) the AO-Grasp Model, which takes a partial point cloud of an articulated object and generates stable and actionable 6 DoF grasps that facilitate downstream manipulation. AO-Grasp not only outperforms baselines in simulation, but also achieves zero-shot sim-to-real transfer (c), enabling interactions with real-world objects with different articulation axes and geometries.
Figure 2: An overview of the AO-Grasp Model. (a) Siamese PointNet++: We find positive and negative correspondences between partial point clouds of an object in joint state $s$ captured from two different object views $v$ and $v'$ to train the network with a hardest contrastive loss $L_{HC}$ (\ref{['eq:hc']}). (b) Pseudo ground truth heatmap $H_{s,v}$ (\ref{['eq:heatmap']}): We supervise training the Actionable Grasp Point Predictor on dense heatmaps computed from grasps in the AO-Grasp Dataset. (c) Actionable Grasp Point Predictor: Given a partial point cloud of an articulated object $P_{s,v}$, the AO-Grasp Point Predictor outputs a heatmap of grasp-likelihood scores $\hat{H}_{s,v}$. It first processes the partial point cloud $P_{s,v}$ with the pre-trained PointNet++ module, resulting in point features $F_{s,v}$. Then, these features are passed through an MLP which returns the grasp-likelihood scores $\hat{H}_{s,v}$. (d) Grasp Proposal Generation: We leverage Contact-GraspNet Sundermeyer-2021-CGN to get per-point grasp orientations. We take the top $k$ scores from $\hat{H}_{s,v}$ to get a set of stable and actionable 6 DoF grasp poses $O_{s,v} \in SE(3)^k$.
Figure 3: A comparison of predicted grasp-likelihood scores from AO-Grasp and baselines CGN, VAT-Mart, and W2A on synthetic point clouds, with top-1 proposals highlighted with black dots. Given that VAT-Mart and W2A use part-segmentation maskssegmentation-fn in their training and evaluation, while AO-Grasp does not, we include their predictions with and without ground truth masks. Compared to AO-Grasp, all baselines tend to propose non-actionable points more often. Point cloud sizes are 4K, 2K, 10K, and 10K for AO-Grasp, CGN, VAT-Mart, and W2A, respectively.
Figure 4: A comparison of predicted grasp-likelihood scores from AO-Grasp and ablations on synthetic point clouds of test category instances. Despite not being trained on any instances with more than one movable part, AO-Grasp (first row of heatmaps) accurately predicts scores for objects with multiple movable parts. For example, in the second and third columns, it detects points on all the handles, and in the last column it detects points on the edges of both doors. Moreover, it rarely predicts false positives, i.e., it does not predict points on the body of the object. In contrast, the model without pre-training PN++ and dense heatmaps (third row of heatmaps) does not predict grasp points on all handles (second and third columns) and predicts many false positives. Training the model with dense heatmaps (second row of heatmaps) reduces the number of false positives (i.e. top left corner of the cabinet in the fourth column). However, the number of false negatives (missed good points) also increases (ie. bottom door of the right-most cabinet).

AO-Grasp: Articulated Object Grasp Generation

TL;DR

Abstract

AO-Grasp: Articulated Object Grasp Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)