Table of Contents
Fetching ...

Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

Yuhang Huang, SHilong Zou, Xinwang Liu, Kai Xu

TL;DR

This work introduces a latent 3D diffusion framework for neural voxel fields to achieve high-resolution, part-aware 3D shape generation. It couples a latent 3D diffusion model with a part-aware shape decoder that uses cross- and self-attention with a learnable part code to enforce accurate part decomposition and rendering. The approach supports multi-modal conditioning (image and text) and delivers state-of-the-art results across eight ShapeNet classes, achieving high geometric fidelity at $96^3$ resolution and high-quality textures. An end-to-end training paradigm with gradient skip and dedicated regularization further enhances efficiency and rendering quality, enabling robust shape interpolation and part-based shape mixing. The method demonstrates substantial practical impact for editable, multi-part 3D modeling with guided generation from diverse inputs.

Abstract

This paper presents a novel latent 3D diffusion model for the generation of neural voxel fields, aiming to achieve accurate part-aware structures. Compared to existing methods, there are two key designs to ensure high-quality and accurate part-aware generation. On one hand, we introduce a latent 3D diffusion process for neural voxel fields, enabling generation at significantly higher resolutions that can accurately capture rich textural and geometric details. On the other hand, a part-aware shape decoder is introduced to integrate the part codes into the neural voxel fields, guiding the accurate part decomposition and producing high-quality rendering results. Through extensive experimentation and comparisons with state-of-the-art methods, we evaluate our approach across four different classes of data. The results demonstrate the superior generative capabilities of our proposed method in part-aware shape generation, outperforming existing state-of-the-art methods.

Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

TL;DR

This work introduces a latent 3D diffusion framework for neural voxel fields to achieve high-resolution, part-aware 3D shape generation. It couples a latent 3D diffusion model with a part-aware shape decoder that uses cross- and self-attention with a learnable part code to enforce accurate part decomposition and rendering. The approach supports multi-modal conditioning (image and text) and delivers state-of-the-art results across eight ShapeNet classes, achieving high geometric fidelity at resolution and high-quality textures. An end-to-end training paradigm with gradient skip and dedicated regularization further enhances efficiency and rendering quality, enabling robust shape interpolation and part-based shape mixing. The method demonstrates substantial practical impact for editable, multi-part 3D modeling with guided generation from diverse inputs.

Abstract

This paper presents a novel latent 3D diffusion model for the generation of neural voxel fields, aiming to achieve accurate part-aware structures. Compared to existing methods, there are two key designs to ensure high-quality and accurate part-aware generation. On one hand, we introduce a latent 3D diffusion process for neural voxel fields, enabling generation at significantly higher resolutions that can accurately capture rich textural and geometric details. On the other hand, a part-aware shape decoder is introduced to integrate the part codes into the neural voxel fields, guiding the accurate part decomposition and producing high-quality rendering results. Through extensive experimentation and comparisons with state-of-the-art methods, we evaluate our approach across four different classes of data. The results demonstrate the superior generative capabilities of our proposed method in part-aware shape generation, outperforming existing state-of-the-art methods.
Paper Structure (24 sections, 12 equations, 13 figures, 7 tables)

This paper contains 24 sections, 12 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Our approach involves a latent 3D diffusion process and a part-aware decoding module. The latent 3D diffusion process facilitates the generation of high-resolution voxel fields, while the part-aware decoding module ensures precise part decomposition. Moreover, we admit multi-modal inputs for conditioned shape generation.
  • Figure 2: The proposed method comprises two main modules: a latent 3D diffusion model and a part-aware shape decoder. The latent 3D diffusion model facilitates the high-resolution generation of neural voxel fields, while the part-aware shape decoder enables the generation of parts-aware results for rendering. The two modules are trained jointly.
  • Figure 3: Detailed architecture of the 3D UNet. The 3D autoencoder does not have the modules in the dashed purple rectangular box.
  • Figure 4: Qualitative comparison against state of the art. Part-NeRF-S denotes adding the 2D part supervision to the original Part-NeRF.
  • Figure 5: Visual improvements of the part-aware shape decoder. Without the part-aware shape decoder, there are inconsistent rendering results in the yellow circles, which shows that part-aware information benefits texture learning.
  • ...and 8 more figures