Table of Contents
Fetching ...

Generalizable Articulated Object Perception with Superpoints

Qiaojun Yu, Ce Hao, Xibin Yuan, Li Zhang, Liu Liu, Yukang Huo, Rohit Agarwal, Cewu Lu

TL;DR

This work tackles the challenge of perceiving and segmenting articulated object parts in 3D point clouds for manipulation tasks. It introduces GAPS, a framework that combines learnable part-aware superpoints, SAM-based 2D priors to identify 3D query points, and a query-based transformer decoder to achieve precise, generalizable part segmentation. The approach yields state-of-the-art cross-category performance on GAPartNet, with AP50 of 77.9% for seen objects and 39.3% for unseen objects, showing strong generalization to novel geometries. By integrating local geometric cues via superpoints and 2D priors within a transformer-based segmentation head, GAPS enhances robustness to variation in object parts and sizes, facilitating more reliable manipulation of articulated objects in robotics.

Abstract

Manipulating articulated objects with robotic arms is challenging due to the complex kinematic structure, which requires precise part segmentation for efficient manipulation. In this work, we introduce a novel superpoint-based perception method designed to improve part segmentation in 3D point clouds of articulated objects. We propose a learnable, part-aware superpoint generation technique that efficiently groups points based on their geometric and semantic similarities, resulting in clearer part boundaries. Furthermore, by leveraging the segmentation capabilities of the 2D foundation model SAM, we identify the centers of pixel regions and select corresponding superpoints as candidate query points. Integrating a query-based transformer decoder further enhances our method's ability to achieve precise part segmentation. Experimental results on the GAPartNet dataset show that our method outperforms existing state-of-the-art approaches in cross-category part segmentation, achieving AP50 scores of 77.9% for seen categories (4.4% improvement) and $39.3\%$ for unseen categories (11.6% improvement), with superior results in 5 out of 9 part categories for seen objects and outperforming all previous methods across all part categories for unseen objects.

Generalizable Articulated Object Perception with Superpoints

TL;DR

This work tackles the challenge of perceiving and segmenting articulated object parts in 3D point clouds for manipulation tasks. It introduces GAPS, a framework that combines learnable part-aware superpoints, SAM-based 2D priors to identify 3D query points, and a query-based transformer decoder to achieve precise, generalizable part segmentation. The approach yields state-of-the-art cross-category performance on GAPartNet, with AP50 of 77.9% for seen objects and 39.3% for unseen objects, showing strong generalization to novel geometries. By integrating local geometric cues via superpoints and 2D priors within a transformer-based segmentation head, GAPS enhances robustness to variation in object parts and sizes, facilitating more reliable manipulation of articulated objects in robotics.

Abstract

Manipulating articulated objects with robotic arms is challenging due to the complex kinematic structure, which requires precise part segmentation for efficient manipulation. In this work, we introduce a novel superpoint-based perception method designed to improve part segmentation in 3D point clouds of articulated objects. We propose a learnable, part-aware superpoint generation technique that efficiently groups points based on their geometric and semantic similarities, resulting in clearer part boundaries. Furthermore, by leveraging the segmentation capabilities of the 2D foundation model SAM, we identify the centers of pixel regions and select corresponding superpoints as candidate query points. Integrating a query-based transformer decoder further enhances our method's ability to achieve precise part segmentation. Experimental results on the GAPartNet dataset show that our method outperforms existing state-of-the-art approaches in cross-category part segmentation, achieving AP50 scores of 77.9% for seen categories (4.4% improvement) and for unseen categories (11.6% improvement), with superior results in 5 out of 9 part categories for seen objects and outperforming all previous methods across all part categories for unseen objects.

Paper Structure

This paper contains 10 sections, 8 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: GAPS segments articulated objects into semantic parts. It leverages both 3D point clouds to cluster superpoints and 2D image segmentation to infer part center, queried by a transformer decoder for part segmentation.
  • Figure 2: Experimental results of part segmentation using rule-based and learnable superpoints. The segmented parts are marked in red box.