Table of Contents
Fetching ...

GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

Haoran Geng, Helin Xu, Chengyang Zhao, Chao Xu, Li Yi, Siyuan Huang, He Wang

TL;DR

GAPartNet presents a cross-category framework for generalizable object perception and manipulation built around Generalizable and Actionable Parts (GAParts). It introduces GAPartNet, a large-scale dataset with 9 GAPart classes across 27 categories and rich part-level annotations, enabling cross-category segmentation, pose estimation, and manipulation. The authors propose a domain-generalizable 3D part segmentation method with domain-adversarial learning, NPCS-based pose estimation, and GAPart-driven manipulation heuristics that transfer to unseen categories in both simulation and the real world. The results show substantial improvements over baselines and demonstrate the practical potential of GAParts for robust, cross-category robotic interaction.

Abstract

For years, researchers have been devoted to generalizable object perception and manipulation, where cross-category generalizability is highly desired yet underexplored. In this work, we propose to learn such cross-category skills via Generalizable and Actionable Parts (GAParts). By identifying and defining 9 GAPart classes (lids, handles, etc.) in 27 object categories, we construct a large-scale part-centric interactive dataset, GAPartNet, where we provide rich, part-level annotations (semantics, poses) for 8,489 part instances on 1,166 objects. Based on GAPartNet, we investigate three cross-category tasks: part segmentation, part pose estimation, and part-based object manipulation. Given the significant domain gaps between seen and unseen object categories, we propose a robust 3D segmentation method from the perspective of domain generalization by integrating adversarial learning techniques. Our method outperforms all existing methods by a large margin, no matter on seen or unseen categories. Furthermore, with part segmentation and pose estimation results, we leverage the GAPart pose definition to design part-based manipulation heuristics that can generalize well to unseen object categories in both the simulator and the real world. Our dataset, code, and demos are available on our project page.

GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

TL;DR

GAPartNet presents a cross-category framework for generalizable object perception and manipulation built around Generalizable and Actionable Parts (GAParts). It introduces GAPartNet, a large-scale dataset with 9 GAPart classes across 27 categories and rich part-level annotations, enabling cross-category segmentation, pose estimation, and manipulation. The authors propose a domain-generalizable 3D part segmentation method with domain-adversarial learning, NPCS-based pose estimation, and GAPart-driven manipulation heuristics that transfer to unseen categories in both simulation and the real world. The results show substantial improvements over baselines and demonstrate the practical potential of GAParts for robust, cross-category robotic interaction.

Abstract

For years, researchers have been devoted to generalizable object perception and manipulation, where cross-category generalizability is highly desired yet underexplored. In this work, we propose to learn such cross-category skills via Generalizable and Actionable Parts (GAParts). By identifying and defining 9 GAPart classes (lids, handles, etc.) in 27 object categories, we construct a large-scale part-centric interactive dataset, GAPartNet, where we provide rich, part-level annotations (semantics, poses) for 8,489 part instances on 1,166 objects. Based on GAPartNet, we investigate three cross-category tasks: part segmentation, part pose estimation, and part-based object manipulation. Given the significant domain gaps between seen and unseen object categories, we propose a robust 3D segmentation method from the perspective of domain generalization by integrating adversarial learning techniques. Our method outperforms all existing methods by a large margin, no matter on seen or unseen categories. Furthermore, with part segmentation and pose estimation results, we leverage the GAPart pose definition to design part-based manipulation heuristics that can generalize well to unseen object categories in both the simulator and the real world. Our dataset, code, and demos are available on our project page.
Paper Structure (60 sections, 7 equations, 11 figures, 6 tables)

This paper contains 60 sections, 7 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Overview. We propose to learn generalizable object perception and manipulation skills via Generalizable and Actionable Parts, and present GAPartNet, a large-scale interactive dataset with rich part annotations. We propose a domain generalization method for cross-category part segmentation and pose estimation. Our GAPart definition boosts cross-category object manipulation and can transfer to real.
  • Figure 2: GAPart Classes. Here we highlight the parts from 9 GAPart classes along with their normalized part coordinate spaces. On the top, we show the four GAPart classes that have continuous rotation symmetry along the $z$ axis, denoted with the red-dashed line and the $\infty$ remark; the bottom-left shows the two GAPart classes that have ${180}^{\circ}$ mirror symmetry along the $z$ axis; and the bottom-right shows the rest three asymmetric GAPart classes.
  • Figure 3: GAPartNet Objects. Objects collected from AKB-48 liu2022akb end with '-A', while the others are from PartNet-Mobility xiang2020sapien.
  • Figure 4: An Overview of Our Domain-generalizable Part Segmentation and Pose Estimation Method. We introduce a part-oriented domain adversarial training strategy that can tackle multi-resolution features and distribution imbalance for the domain-invariant GAPart feature extraction. The training strategy tackles the challenges in our tasks and dataset, significantly improving the generalizability of our method for part segmentation and pose estimation.
  • Figure 5: Qualitative Results of Perception. Left two figures show the results of cross-category part segmentation and pose estimation on seen and unseen categories, while the right shows failure cases. Here we only show the revolute joint estimation results.
  • ...and 6 more figures