Efficient Part-level 3D Object Generation via Dual Volume Packing
Jiaxiang Tang, Ruijie Lu, Zhaoshuo Li, Zekun Hao, Xuan Li, Fangyin Wei, Shuran Song, Gang Zeng, Ming-Yu Liu, Tsung-Yi Lin
TL;DR
This work addresses the challenge of generating editable, part-level 3D objects from a single image, where objects have an unknown and variable number of parts. It introduces dual volume packing to convert part connectivity into a bipartite structure and packs two disjoint volumes to keep contacting parts separate while maintaining a fixed output length. A dual latent generation pipeline combines a VAE, DINOv2-conditioned encoder, and a rectified flow to produce two volumes that assemble into the final object, with part-level data curation to support training. Experiments show competitive quality, diversity, and efficiency (about 30 seconds per sample) compared with segmentation-based baselines, highlighting improved part separation and end-to-end generation without segmentation priors. Limitations include granularity control and the restriction to bipartite graph structures, suggesting avenues for extending to more volumes and alternative graph colorings for greater expressiveness.
Abstract
Recent progress in 3D object generation has greatly improved both the quality and efficiency. However, most existing methods generate a single mesh with all parts fused together, which limits the ability to edit or manipulate individual parts. A key challenge is that different objects may have a varying number of parts. To address this, we propose a new end-to-end framework for part-level 3D object generation. Given a single input image, our method generates high-quality 3D objects with an arbitrary number of complete and semantically meaningful parts. We introduce a dual volume packing strategy that organizes all parts into two complementary volumes, allowing for the creation of complete and interleaved parts that assemble into the final object. Experiments show that our model achieves better quality, diversity, and generalization than previous image-based part-level generation methods.
