Table of Contents
Fetching ...

Part123: Part-aware 3D Reconstruction from a Single-view Image

Anran Liu, Cheng Lin, Yuan Liu, Xiaoxiao Long, Zhiyang Dou, Hao-Xiang Guo, Ping Luo, Wenping Wang

TL;DR

Part123 addresses the challenge of reconstructing 3D shapes from a single image while preserving meaningful part structure. It integrates multiview diffusion for consistent view synthesis, SAM-based 2D segmentation, and a contrastive-learning–augmented NeuS to build a part-aware 3D representation, followed by an automatic method to derive 3D parts. The approach achieves competitive reconstruction quality with state-of-the-art single-view methods and substantially improves part segmentation quality, enabling applications such as feature-preserving reconstruction, primitive fitting, and shape editing. By leveraging 2D segmentation generalization and automatic part discovery, Part123 offers robust, open-ended part-aware 3D modeling applicable across diverse object categories and backbones, advancing practical 3D shape understanding from limited inputs.

Abstract

Recently, the emergence of diffusion models has opened up new opportunities for single-view reconstruction. However, all the existing methods represent the target object as a closed mesh devoid of any structural information, thus neglecting the part-based structure, which is crucial for many downstream applications, of the reconstructed shape. Moreover, the generated meshes usually suffer from large noises, unsmooth surfaces, and blurry textures, making it challenging to obtain satisfactory part segments using 3D segmentation techniques. In this paper, we present Part123, a novel framework for part-aware 3D reconstruction from a single-view image. We first use diffusion models to generate multiview-consistent images from a given image, and then leverage Segment Anything Model (SAM), which demonstrates powerful generalization ability on arbitrary objects, to generate multiview segmentation masks. To effectively incorporate 2D part-based information into 3D reconstruction and handle inconsistency, we introduce contrastive learning into a neural rendering framework to learn a part-aware feature space based on the multiview segmentation masks. A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models. Experiments show that our method can generate 3D models with high-quality segmented parts on various objects. Compared to existing unstructured reconstruction methods, the part-aware 3D models from our method benefit some important applications, including feature-preserving reconstruction, primitive fitting, and 3D shape editing.

Part123: Part-aware 3D Reconstruction from a Single-view Image

TL;DR

Part123 addresses the challenge of reconstructing 3D shapes from a single image while preserving meaningful part structure. It integrates multiview diffusion for consistent view synthesis, SAM-based 2D segmentation, and a contrastive-learning–augmented NeuS to build a part-aware 3D representation, followed by an automatic method to derive 3D parts. The approach achieves competitive reconstruction quality with state-of-the-art single-view methods and substantially improves part segmentation quality, enabling applications such as feature-preserving reconstruction, primitive fitting, and shape editing. By leveraging 2D segmentation generalization and automatic part discovery, Part123 offers robust, open-ended part-aware 3D modeling applicable across diverse object categories and backbones, advancing practical 3D shape understanding from limited inputs.

Abstract

Recently, the emergence of diffusion models has opened up new opportunities for single-view reconstruction. However, all the existing methods represent the target object as a closed mesh devoid of any structural information, thus neglecting the part-based structure, which is crucial for many downstream applications, of the reconstructed shape. Moreover, the generated meshes usually suffer from large noises, unsmooth surfaces, and blurry textures, making it challenging to obtain satisfactory part segments using 3D segmentation techniques. In this paper, we present Part123, a novel framework for part-aware 3D reconstruction from a single-view image. We first use diffusion models to generate multiview-consistent images from a given image, and then leverage Segment Anything Model (SAM), which demonstrates powerful generalization ability on arbitrary objects, to generate multiview segmentation masks. To effectively incorporate 2D part-based information into 3D reconstruction and handle inconsistency, we introduce contrastive learning into a neural rendering framework to learn a part-aware feature space based on the multiview segmentation masks. A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models. Experiments show that our method can generate 3D models with high-quality segmented parts on various objects. Compared to existing unstructured reconstruction methods, the part-aware 3D models from our method benefit some important applications, including feature-preserving reconstruction, primitive fitting, and 3D shape editing.
Paper Structure (22 sections, 12 figures, 2 tables)

This paper contains 22 sections, 12 figures, 2 tables.

Figures (12)

  • Figure 1: 2D segmentation masks without correspondence and with multi-view inconsistency (highlighted with red boxes). Left: multiview images. Right: 2D segmentation masks from SAM sam; different colors indicate different parts, and there is no correspondence between masks across views.
  • Figure 2: Overall framework of our method, Part123. Our method takes a single-view image as input and generates its 3D model with segmented part components. For a single-view input, we first generate its multiview images using multiview diffusion. Then their 2D segmentation masks are predicted with a generalizable 2D image segmentation model, SAM, and part-aware reconstruction is conducted based on these 2D segmentations. Finally, the reconstructed model with part segments is built using an automatic algorithm. Note: for the "2D Segmentation" and "Part-aware Model", different colors indicate different parts.
  • Figure 3: (a) Illustration on the sampling strategy for contrastive learning. For each query pixel, its positive sample is selected from the same segmentation mask and its negative sample is from a different mask in the same view. There is no restriction between pixels in different views. (b) The proposed part-aware NeuS network with a part-segment branch. Training losses can be calculated using pixel-level outputs computed with volume rendering through all sampled points along the rays.
  • Figure 4: Qualitative results of part-aware reconstruction by our method, based on 2D segmentation masks with high inconsistency across views. For each object, we show the input single-view image, the generated multiview images along with their corresponding 2D segmentation masks, and the part-aware 3D reconstruction model. Our method can produce 3D models with high-quality part segments in spite of the inconsistent multiview segmentations.
  • Figure 5: Comparison of 3D part segmentation with state-of-the-art methods, including SAM3D sam3d, WCSeg wcseg and SEG-MAT seg_mat. Our method generates high-quality part segments with clear boundaries and meaningful parts, while other methods show inferior results due to noisy boundaries or missing parts.
  • ...and 7 more figures