Table of Contents
Fetching ...

Name That Part: 3D Part Segmentation and Naming

Soumava Paul, Prakhar Kaushik, Ankit Vaidya, Anand Bhattad, Alan Yuille

TL;DR

ALIGN-Parts reframes semantic 3D part segmentation as a direct set-alignment problem, producing named part decompositions in a single forward pass. It introduces Partlets—shape-conditioned, part-level representations—that are jointly learned with multi-modal geometry, appearance, and affordance-based text embeddings, then matched to candidate descriptions via differentiable optimal transport. The approach enables open-vocabulary naming, permutation-consistent labeling, and efficient annotation at scale, demonstrated by strong improvements over baselines and the creation of TexParts with human-in-the-loop verification. The work also provides new metrics for named 3D part segmentation and shows potential for scalable data generation and downstream 3D asset annotation tasks.

Abstract

We address semantic 3D part segmentation: decomposing objects into parts with meaningful names. While datasets exist with part annotations, their definitions are inconsistent across datasets, limiting robust training. Previous methods produce unlabeled decompositions or retrieve single parts without complete shape annotations. We propose ALIGN-Parts, which formulates part naming as a direct set alignment task. Our method decomposes shapes into partlets - implicit 3D part representations - matched to part descriptions via bipartite assignment. We combine geometric cues from 3D part fields, appearance from multi-view vision features, and semantic knowledge from language-model-generated affordance descriptions. Text-alignment loss ensures partlets share embedding space with text, enabling a theoretically open-vocabulary matching setup, given sufficient data. Our efficient and novel, one-shot, 3D part segmentation and naming method finds applications in several downstream tasks, including serving as a scalable annotation engine. As our model supports zero-shot matching to arbitrary descriptions and confidence-calibrated predictions for known categories, with human verification, we create a unified ontology that aligns PartNet, 3DCoMPaT++, and Find3D, consisting of 1,794 unique 3D parts. We also show examples from our newly created Tex-Parts dataset. We also introduce 2 novel metrics appropriate for the named 3D part segmentation task.

Name That Part: 3D Part Segmentation and Naming

TL;DR

ALIGN-Parts reframes semantic 3D part segmentation as a direct set-alignment problem, producing named part decompositions in a single forward pass. It introduces Partlets—shape-conditioned, part-level representations—that are jointly learned with multi-modal geometry, appearance, and affordance-based text embeddings, then matched to candidate descriptions via differentiable optimal transport. The approach enables open-vocabulary naming, permutation-consistent labeling, and efficient annotation at scale, demonstrated by strong improvements over baselines and the creation of TexParts with human-in-the-loop verification. The work also provides new metrics for named 3D part segmentation and shows potential for scalable data generation and downstream 3D asset annotation tasks.

Abstract

We address semantic 3D part segmentation: decomposing objects into parts with meaningful names. While datasets exist with part annotations, their definitions are inconsistent across datasets, limiting robust training. Previous methods produce unlabeled decompositions or retrieve single parts without complete shape annotations. We propose ALIGN-Parts, which formulates part naming as a direct set alignment task. Our method decomposes shapes into partlets - implicit 3D part representations - matched to part descriptions via bipartite assignment. We combine geometric cues from 3D part fields, appearance from multi-view vision features, and semantic knowledge from language-model-generated affordance descriptions. Text-alignment loss ensures partlets share embedding space with text, enabling a theoretically open-vocabulary matching setup, given sufficient data. Our efficient and novel, one-shot, 3D part segmentation and naming method finds applications in several downstream tasks, including serving as a scalable annotation engine. As our model supports zero-shot matching to arbitrary descriptions and confidence-calibrated predictions for known categories, with human verification, we create a unified ontology that aligns PartNet, 3DCoMPaT++, and Find3D, consisting of 1,794 unique 3D parts. We also show examples from our newly created Tex-Parts dataset. We also introduce 2 novel metrics appropriate for the named 3D part segmentation task.

Paper Structure

This paper contains 30 sections, 29 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: ALIGN-Parts segments and names 3D parts, unlike PartField liu2025partfield, which only segments. Our method is $100\times$ (post data pre-processing) faster at generating these segments along with names compared to PartField.
  • Figure 2: ALIGN-Parts. Overview of the ALIGN-Parts framework for language-grounded 3D part segmentation and naming. Top: training. Given a 3D from our semantically unified 3D parts data, geometry features are extracted with PartField and appearance features with DINOv2 from multi-view renderings; these are fused by the BiCo Fusion module using efficient bi-directional cross-attention on local $k{=}16$ nearest-neighbor graphs in 3D space, reducing complexity from $\mathcal{O}(N^2)$ to $\mathcal{O}(Nk)$ and yielding enriched point features. A decoder then learns $K$ part-level "Partlet" representations that aggregate the fused features, with segmentation supervision provided at the Partlet level. To semantically ground Partlets, an LLM generates affordance-aware descriptions for each possible part (e.g., "structural supports that elevate the sofa" for sofa legs), which are embedded by a pretrained MPNet encoder; Sinkhorn matching establishes a bipartite assignment between Partlet and text embeddings, and an InfoNCE loss further aligns them in a shared representation space while a classifier predicts the object category. Bottom: inference. At test time, ALIGN-Parts operates in both closed-vocabulary (object categories similar to those seen in training) and open-vocabulary (novel object categories) settings: in the closed-vocabulary case, the trained 3D classifier predicts the object class and retrieves its candidate part list, whereas in the open-vocabulary case an LLM proposes an overcomplete set of plausible parts for the queried object. Given these candidate part texts, their MPNet embeddings are bipartite-matched to the predicted Partlets, which jointly produce 3D part segmentation masks and corresponding part names.
  • Figure 3: Pairwise cosine similarity heatmaps between text embeddings for MPNet (left) and SigLiP (right). (Zoom in for labels)
  • Figure 4: Qualitative Results. ALIGN-Parts segments and names 3D parts robustly in a single feed-forward pass (rightmost column). Find3D ma2024find (first column) fails despite ground-truth part names, unable to segment the laptop in Figure \ref{['fig:results']} (bottom row). PartField liu2025partfield (second column) also fails: it requires ground-truth part counts for clustering, missegments bed bunks (top row), and misses refrigerator handles (second-to-last row). Our strong baseline without Partlets (third column) exhibits similar errors. In contrast, ALIGN-Parts correctly segments tiny parts, such as handles, and groups semantically similar instances (e.g., all ceiling fan blades into a single cluster).
  • Figure 5: Proposed Metrics Correlation Analysis. Correlation analysis between our proposed label-aware mIoU metrics (strict/relaxed) and class-agnostic mIoU, computed on segmentation results from our ALIGN-Parts model. The strict label-aware metric (left) shows moderate agreement with class-agnostic mIoU (Pearson $r=0.739$, Spearman $\rho=0.730$, $N=206$), while the relaxed variant (right) demonstrates near-perfect correlation (Pearson $r=0.978$, Spearman $\rho=0.974$, $N=206$). These findings indicate that our model achieves strong semantic and quantitative consistency, further supporting the use of the relaxed metric as a robust evaluation protocol for semantic 3D part segmentation.
  • ...and 7 more figures