ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation
Yuheng Xue, Nenglun Chen, Jun Liu, Wenyun Sun
TL;DR
ZeroPS tackles zero-shot 3D part segmentation by transferring knowledge from 2D foundation models (SAM and GLIP) to 3D point clouds through a training-free, multi-view framework. It introduces self-extension to lift 2D SAM segments into 3D, a merging step to create coherent 3D parts, and CNVP/TDCM-based multi-model labeling to assign instance labels without training. Across PartNetE and AKBSeg, ZeroPS achieves substantial improvements over state-of-the-art zero-shot methods and narrows the gap to fully supervised approaches, while maintaining robustness to domain shifts. The approach offers a practical, scalable pathway for zero-shot 3D segmentation in real-world settings with minimal model modification.
Abstract
Zero-shot 3D part segmentation is a challenging and fundamental task. In this work, we propose a novel pipeline, ZeroPS, which achieves high-quality knowledge transfer from 2D pretrained foundation models (FMs), SAM and GLIP, to 3D object point clouds. We aim to explore the natural relationship between multi-view correspondence and the FMs' prompt mechanism and build bridges on it. In ZeroPS, the relationship manifests as follows: 1) lifting 2D to 3D by leveraging co-viewed regions and SAM's prompt mechanism, 2) relating 1D classes to 3D parts by leveraging 2D-3D view projection and GLIP's prompt mechanism, and 3) enhancing prediction performance by leveraging multi-view observations. Extensive evaluations on the PartNetE and AKBSeg benchmarks demonstrate that ZeroPS significantly outperforms the SOTA method across zero-shot unlabeled and instance segmentation tasks. ZeroPS does not require additional training or fine-tuning for the FMs. ZeroPS applies to both simulated and real-world data. It is hardly affected by domain shift. The project page is available at https://luis2088.github.io/ZeroPS_page.
