Table of Contents
Fetching ...

OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

Yunhan Yang, Yufan Zhou, Yuan-Chen Guo, Zi-Xin Zou, Yukun Huang, Ying-Tian Liu, Hao Xu, Ding Liang, Yan-Pei Cao, Xihui Liu

TL;DR

OmniPart tackles the challenge of creating editable, part-based 3D assets by decoupling structure planning from dense part synthesis. It introduces an autoregressive bounding-box planner guided by flexible 2D masks and a spatially-conditioned, rectified-flow latent generator that jointly synthesizes all parts within the planned layout, leveraging a TRELLIS-based structured latent space. The approach achieves state-of-the-art part-aware 3D generation with strong part-level control, coherence, and texture capability, enabling tasks like animation and material editing. This framework significantly enhances interpretability and editability of complex 3D content while maintaining high fidelity and efficiency.

Abstract

The creation of 3D assets with explicit, editable part structures is crucial for advancing interactive applications, yet most generative methods produce only monolithic shapes, limiting their utility. We introduce OmniPart, a novel framework for part-aware 3D object generation designed to achieve high semantic decoupling among components while maintaining robust structural cohesion. OmniPart uniquely decouples this complex task into two synergistic stages: (1) an autoregressive structure planning module generates a controllable, variable-length sequence of 3D part bounding boxes, critically guided by flexible 2D part masks that allow for intuitive control over part decomposition without requiring direct correspondences or semantic labels; and (2) a spatially-conditioned rectified flow model, efficiently adapted from a pre-trained holistic 3D generator, synthesizes all 3D parts simultaneously and consistently within the planned layout. Our approach supports user-defined part granularity, precise localization, and enables diverse downstream applications. Extensive experiments demonstrate that OmniPart achieves state-of-the-art performance, paving the way for more interpretable, editable, and versatile 3D content.

OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

TL;DR

OmniPart tackles the challenge of creating editable, part-based 3D assets by decoupling structure planning from dense part synthesis. It introduces an autoregressive bounding-box planner guided by flexible 2D masks and a spatially-conditioned, rectified-flow latent generator that jointly synthesizes all parts within the planned layout, leveraging a TRELLIS-based structured latent space. The approach achieves state-of-the-art part-aware 3D generation with strong part-level control, coherence, and texture capability, enabling tasks like animation and material editing. This framework significantly enhances interpretability and editability of complex 3D content while maintaining high fidelity and efficiency.

Abstract

The creation of 3D assets with explicit, editable part structures is crucial for advancing interactive applications, yet most generative methods produce only monolithic shapes, limiting their utility. We introduce OmniPart, a novel framework for part-aware 3D object generation designed to achieve high semantic decoupling among components while maintaining robust structural cohesion. OmniPart uniquely decouples this complex task into two synergistic stages: (1) an autoregressive structure planning module generates a controllable, variable-length sequence of 3D part bounding boxes, critically guided by flexible 2D part masks that allow for intuitive control over part decomposition without requiring direct correspondences or semantic labels; and (2) a spatially-conditioned rectified flow model, efficiently adapted from a pre-trained holistic 3D generator, synthesizes all 3D parts simultaneously and consistently within the planned layout. Our approach supports user-defined part granularity, precise localization, and enables diverse downstream applications. Extensive experiments demonstrate that OmniPart achieves state-of-the-art performance, paving the way for more interpretable, editable, and versatile 3D content.

Paper Structure

This paper contains 15 sections, 9 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: An overview of the OmniPart model design. OmniPart generates part-aware, controllable, and high-quality 3D content through two key stages: part structure planning and structured part latent generation. Built upon TRELLIS xiang2024structured, which provides a spatially structured sparse voxel latent space, OmniPart first predicts part-level bounding boxes via an autoregressive planner. Then, part-specific latent codes are generated through fine-tuning of a large-scale shape model pretrained on overall objects.
  • Figure 2: Spatially-conditioned part synthesis. The sparse voxels of the whole shape and each part are filled with noisy latents, which are denoised with a network composed of part-aware sparse downsample/upsample layers and transformer layers. The tokens are augmented with position embeddings and part position embeddings (PPE). The denoising process also predicts a validity score for each voxel to discard redundant voxels (the ones with stripes in the figure) in each box.
  • Figure 3: Visualization of the training dataset. We show the distribution of part counts across the dataset (Number of parts per model vs. Frequency) and include representative examples from four different part-count ranges.
  • Figure 4: Qualitative comparison of part-aware 3D generation. Our method leverages TRELLIS to decode both mesh and 3D Gaussian splats, baking color onto the mesh to produce textured parts. HoloPart and Part123 are visualized using solid colors due to the lack of texture support. Segmentation-based methods (e.g., PartField) capture only surface-level masks, while Completion-based methods (e.g., HoloPart) are limited by segmentation quality. PartGen generates full parts but with low geometric and semantic quality. In contrast, our method achieves low semantic coupling and high structural cohesion.
  • Figure 5: Applications of our part-aware 3D generation framework. (a) Mask-Controlled Generation: Users can specify 2D masks to guide the structure of the generated parts. (b) Multi-Granularity Generation: Adjusting the segmentation scale of 2D masks enables generation at different levels of part granularity. (c) Material Editing: Part-specific textures, such as clothing items, can be modified independently. (d) Geometry Processing: Our part-aware outputs support high-quality geometry processing (such as remeshing) and preserve structural coherence, avoiding artifacts at part boundaries.
  • ...and 1 more figures