Table of Contents
Fetching ...

SAMPart3D: Segment Any Part in 3D Objects

Yunhan Yang, Yukun Huang, Yuan-Chen Guo, Liangjun Lu, Xiaoyang Wu, Edmund Y. Lam, Yan-Pei Cao, Xihui Liu

TL;DR

SAMPart3D tackles zero-shot 3D part segmentation across open-world objects by discarding text prompts and predefined part label sets. It couples text-agnostic 2D-to-3D feature distillation from DINOv2 with a scale-conditioned grouping module to produce multi-granularity 3D parts, then uses multimodal language models on multi-view renderings to assign semantic labels. A new PartObjaverse-Tiny benchmark supports evaluation of fine-grained parts, and results show state-of-the-art performance against prior zero-shot methods with practical capabilities for part-level editing and interactive segmentation. The approach scales to Objaverse-scale data and enables flexible, scalability-aware 3D part understanding for downstream applications.

Abstract

3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing. Recent methods harness the powerful Vision Language Models (VLMs) for 2D-to-3D knowledge distillation, achieving zero-shot 3D part segmentation. However, these methods are limited by their reliance on text prompts, which restricts the scalability to large-scale unlabeled datasets and the flexibility in handling part ambiguities. In this work, we introduce SAMPart3D, a scalable zero-shot 3D part segmentation framework that segments any 3D object into semantic parts at multiple granularities, without requiring predefined part label sets as text prompts. For scalability, we use text-agnostic vision foundation models to distill a 3D feature extraction backbone, allowing scaling to large unlabeled 3D datasets to learn rich 3D priors. For flexibility, we distill scale-conditioned part-aware 3D features for 3D part segmentation at multiple granularities. Once the segmented parts are obtained from the scale-conditioned part-aware 3D features, we use VLMs to assign semantic labels to each part based on the multi-view renderings. Compared to previous methods, our SAMPart3D can scale to the recent large-scale 3D object dataset Objaverse and handle complex, non-ordinary objects. Additionally, we contribute a new 3D part segmentation benchmark to address the lack of diversity and complexity of objects and parts in existing benchmarks. Experiments show that our SAMPart3D significantly outperforms existing zero-shot 3D part segmentation methods, and can facilitate various applications such as part-level editing and interactive segmentation.

SAMPart3D: Segment Any Part in 3D Objects

TL;DR

SAMPart3D tackles zero-shot 3D part segmentation across open-world objects by discarding text prompts and predefined part label sets. It couples text-agnostic 2D-to-3D feature distillation from DINOv2 with a scale-conditioned grouping module to produce multi-granularity 3D parts, then uses multimodal language models on multi-view renderings to assign semantic labels. A new PartObjaverse-Tiny benchmark supports evaluation of fine-grained parts, and results show state-of-the-art performance against prior zero-shot methods with practical capabilities for part-level editing and interactive segmentation. The approach scales to Objaverse-scale data and enables flexible, scalability-aware 3D part understanding for downstream applications.

Abstract

3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing. Recent methods harness the powerful Vision Language Models (VLMs) for 2D-to-3D knowledge distillation, achieving zero-shot 3D part segmentation. However, these methods are limited by their reliance on text prompts, which restricts the scalability to large-scale unlabeled datasets and the flexibility in handling part ambiguities. In this work, we introduce SAMPart3D, a scalable zero-shot 3D part segmentation framework that segments any 3D object into semantic parts at multiple granularities, without requiring predefined part label sets as text prompts. For scalability, we use text-agnostic vision foundation models to distill a 3D feature extraction backbone, allowing scaling to large unlabeled 3D datasets to learn rich 3D priors. For flexibility, we distill scale-conditioned part-aware 3D features for 3D part segmentation at multiple granularities. Once the segmented parts are obtained from the scale-conditioned part-aware 3D features, we use VLMs to assign semantic labels to each part based on the multi-view renderings. Compared to previous methods, our SAMPart3D can scale to the recent large-scale 3D object dataset Objaverse and handle complex, non-ordinary objects. Additionally, we contribute a new 3D part segmentation benchmark to address the lack of diversity and complexity of objects and parts in existing benchmarks. Experiments show that our SAMPart3D significantly outperforms existing zero-shot 3D part segmentation methods, and can facilitate various applications such as part-level editing and interactive segmentation.

Paper Structure

This paper contains 19 sections, 5 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: SAMPart3D is able to segment any 3D object into semantic parts across multiple levels of granularity, without the need for predefined part label sets or text prompts. It supports a range of applications, including part-level editing and interactive segmentation.
  • Figure 2: An overview pipeline of SAMPart3D. (a) We first pre-train 3D backbone PTv3-object on 3D large-scale data Objaverse, distilling visual features from FeatUp-DINOv2. (b) Next, we train light-weight MLPs to distill 2D masks to scale-conditioned grouping. (c) Finally, we cluster the feature of point clouds and highlight the consistent 2D part area with 2D-3D mapping on multi-view renderings, and then query semantics from MLLMs.
  • Figure 3: Visualization of PartObjaverse-Tiny with part-level semantic and instance segmentation labels.
  • Figure 4: Visualization of multi-granularity 3D part segmentation on GSO downs2022google, OmniObject3D wu2023omniobject3d, Vroid chen2023panic3d and 3D generated meshes.
  • Figure 5: Qualitative comparison with PartSLIP liu2023partslip and SATR abdelreheem2023satr in the semantic segmentation task on the PartObjaverse-Tiny dataset.
  • ...and 6 more figures