Table of Contents
Fetching ...

Efficient Part-level 3D Object Generation via Dual Volume Packing

Jiaxiang Tang, Ruijie Lu, Zhaoshuo Li, Zekun Hao, Xuan Li, Fangyin Wei, Shuran Song, Gang Zeng, Ming-Yu Liu, Tsung-Yi Lin

TL;DR

This work addresses the challenge of generating editable, part-level 3D objects from a single image, where objects have an unknown and variable number of parts. It introduces dual volume packing to convert part connectivity into a bipartite structure and packs two disjoint volumes to keep contacting parts separate while maintaining a fixed output length. A dual latent generation pipeline combines a VAE, DINOv2-conditioned encoder, and a rectified flow to produce two volumes that assemble into the final object, with part-level data curation to support training. Experiments show competitive quality, diversity, and efficiency (about 30 seconds per sample) compared with segmentation-based baselines, highlighting improved part separation and end-to-end generation without segmentation priors. Limitations include granularity control and the restriction to bipartite graph structures, suggesting avenues for extending to more volumes and alternative graph colorings for greater expressiveness.

Abstract

Recent progress in 3D object generation has greatly improved both the quality and efficiency. However, most existing methods generate a single mesh with all parts fused together, which limits the ability to edit or manipulate individual parts. A key challenge is that different objects may have a varying number of parts. To address this, we propose a new end-to-end framework for part-level 3D object generation. Given a single input image, our method generates high-quality 3D objects with an arbitrary number of complete and semantically meaningful parts. We introduce a dual volume packing strategy that organizes all parts into two complementary volumes, allowing for the creation of complete and interleaved parts that assemble into the final object. Experiments show that our model achieves better quality, diversity, and generalization than previous image-based part-level generation methods.

Efficient Part-level 3D Object Generation via Dual Volume Packing

TL;DR

This work addresses the challenge of generating editable, part-level 3D objects from a single image, where objects have an unknown and variable number of parts. It introduces dual volume packing to convert part connectivity into a bipartite structure and packs two disjoint volumes to keep contacting parts separate while maintaining a fixed output length. A dual latent generation pipeline combines a VAE, DINOv2-conditioned encoder, and a rectified flow to produce two volumes that assemble into the final object, with part-level data curation to support training. Experiments show competitive quality, diversity, and efficiency (about 30 seconds per sample) compared with segmentation-based baselines, highlighting improved part separation and end-to-end generation without segmentation priors. Limitations include granularity control and the restriction to bipartite graph structures, suggesting avenues for extending to more volumes and alternative graph colorings for greater expressiveness.

Abstract

Recent progress in 3D object generation has greatly improved both the quality and efficiency. However, most existing methods generate a single mesh with all parts fused together, which limits the ability to edit or manipulate individual parts. A key challenge is that different objects may have a varying number of parts. To address this, we propose a new end-to-end framework for part-level 3D object generation. Given a single input image, our method generates high-quality 3D objects with an arbitrary number of complete and semantically meaningful parts. We introduce a dual volume packing strategy that organizes all parts into two complementary volumes, allowing for the creation of complete and interleaved parts that assemble into the final object. Experiments show that our model achieves better quality, diversity, and generalization than previous image-based part-level generation methods.

Paper Structure

This paper contains 21 sections, 1 equation, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: End-to-end Part-level Image-to-3D Generation. We present a method to generate high-quality 3D shape composed of individual and complete parts from a single-view image. Our method is trained only with 3D native information and can generate part-level meshes in about 30 seconds without relying on 2D segmentation prior models.
  • Figure 2: Dual Volume Packing. Given a 3D mesh with part-level annotations, we propose to convert the part-connectivity graph into a bipartite graph, such that all parts can be packed into two volumes. Within each volume, parts do not contact each other, thus can be separated during mesh extraction.
  • Figure 3: Network Architecture. Our model takes a single-view image as the input condition, and generate the dual latents at the same time with a flow model. The latents are decoded to dual volumes, which can be divided into parts and assembled back to the whole mesh.
  • Figure 4: Comparison on Image-to-3D Generation. Our method generates part-level meshes with competitive quality from single-view images compared to previous methods.
  • Figure 5: Comparison on Part-level 3D Generation. Our method directly generate complete parts, while other methods require mesh segmentation and part completion.
  • ...and 5 more figures