Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

Yuhang Huang; SHilong Zou; Xinwang Liu; Kai Xu

Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

Yuhang Huang, SHilong Zou, Xinwang Liu, Kai Xu

TL;DR

This work introduces a latent 3D diffusion framework for neural voxel fields to achieve high-resolution, part-aware 3D shape generation. It couples a latent 3D diffusion model with a part-aware shape decoder that uses cross- and self-attention with a learnable part code to enforce accurate part decomposition and rendering. The approach supports multi-modal conditioning (image and text) and delivers state-of-the-art results across eight ShapeNet classes, achieving high geometric fidelity at $96^3$ resolution and high-quality textures. An end-to-end training paradigm with gradient skip and dedicated regularization further enhances efficiency and rendering quality, enabling robust shape interpolation and part-based shape mixing. The method demonstrates substantial practical impact for editable, multi-part 3D modeling with guided generation from diverse inputs.

Abstract

This paper presents a novel latent 3D diffusion model for the generation of neural voxel fields, aiming to achieve accurate part-aware structures. Compared to existing methods, there are two key designs to ensure high-quality and accurate part-aware generation. On one hand, we introduce a latent 3D diffusion process for neural voxel fields, enabling generation at significantly higher resolutions that can accurately capture rich textural and geometric details. On the other hand, a part-aware shape decoder is introduced to integrate the part codes into the neural voxel fields, guiding the accurate part decomposition and producing high-quality rendering results. Through extensive experimentation and comparisons with state-of-the-art methods, we evaluate our approach across four different classes of data. The results demonstrate the superior generative capabilities of our proposed method in part-aware shape generation, outperforming existing state-of-the-art methods.

Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

TL;DR

resolution and high-quality textures. An end-to-end training paradigm with gradient skip and dedicated regularization further enhances efficiency and rendering quality, enabling robust shape interpolation and part-based shape mixing. The method demonstrates substantial practical impact for editable, multi-part 3D modeling with guided generation from diverse inputs.

Abstract

Paper Structure (24 sections, 12 equations, 13 figures, 7 tables)

This paper contains 24 sections, 12 equations, 13 figures, 7 tables.

Introduction
Related Work
Proposed Method
Preliminary
Neural Voxel Fields
Decoupled Diffusion Models
Latent 3D Diffusion Model
Part-aware Shape Decoder
End-to-end Training with Gradient Skip
Detailed Architectures
Experiments
Experimental Setup
Dataset
Implementation Details
Conditioned Generation
...and 9 more sections

Figures (13)

Figure 1: Our approach involves a latent 3D diffusion process and a part-aware decoding module. The latent 3D diffusion process facilitates the generation of high-resolution voxel fields, while the part-aware decoding module ensures precise part decomposition. Moreover, we admit multi-modal inputs for conditioned shape generation.
Figure 2: The proposed method comprises two main modules: a latent 3D diffusion model and a part-aware shape decoder. The latent 3D diffusion model facilitates the high-resolution generation of neural voxel fields, while the part-aware shape decoder enables the generation of parts-aware results for rendering. The two modules are trained jointly.
Figure 3: Detailed architecture of the 3D UNet. The 3D autoencoder does not have the modules in the dashed purple rectangular box.
Figure 4: Qualitative comparison against state of the art. Part-NeRF-S denotes adding the 2D part supervision to the original Part-NeRF.
Figure 5: Visual improvements of the part-aware shape decoder. Without the part-aware shape decoder, there are inconsistent rendering results in the yellow circles, which shows that part-aware information benefits texture learning.
...and 8 more figures

Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

TL;DR

Abstract

Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

Authors

TL;DR

Abstract

Table of Contents

Figures (13)