Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields
Yuhang Huang, SHilong Zou, Xinwang Liu, Kai Xu
TL;DR
This work introduces a latent 3D diffusion framework for neural voxel fields to achieve high-resolution, part-aware 3D shape generation. It couples a latent 3D diffusion model with a part-aware shape decoder that uses cross- and self-attention with a learnable part code to enforce accurate part decomposition and rendering. The approach supports multi-modal conditioning (image and text) and delivers state-of-the-art results across eight ShapeNet classes, achieving high geometric fidelity at $96^3$ resolution and high-quality textures. An end-to-end training paradigm with gradient skip and dedicated regularization further enhances efficiency and rendering quality, enabling robust shape interpolation and part-based shape mixing. The method demonstrates substantial practical impact for editable, multi-part 3D modeling with guided generation from diverse inputs.
Abstract
This paper presents a novel latent 3D diffusion model for the generation of neural voxel fields, aiming to achieve accurate part-aware structures. Compared to existing methods, there are two key designs to ensure high-quality and accurate part-aware generation. On one hand, we introduce a latent 3D diffusion process for neural voxel fields, enabling generation at significantly higher resolutions that can accurately capture rich textural and geometric details. On the other hand, a part-aware shape decoder is introduced to integrate the part codes into the neural voxel fields, guiding the accurate part decomposition and producing high-quality rendering results. Through extensive experimentation and comparisons with state-of-the-art methods, we evaluate our approach across four different classes of data. The results demonstrate the superior generative capabilities of our proposed method in part-aware shape generation, outperforming existing state-of-the-art methods.
