Table of Contents
Fetching ...

Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation

Philipp Schröppel, Christopher Wewer, Jan Eric Lenssen, Eddy Ilg, Thomas Brox

TL;DR

This work introduces Neural Point Cloud Diffusion (NPCD), a diffusion-based framework that operates on a hybrid neural point cloud and radiance field to disentangle 3D shape from appearance. By training a category-level Point-NeRF autodecoder, NPCD yields high-quality samples where geometry and texture can be sampled independently, enabling explicit shape or appearance control. Empirical results show state-of-the-art disentangled generation compared to GAN-based baselines and competitive performance against other diffusion methods that do not support disentanglement, with thorough ablations on initialization, feature dimensionality, and regularization. The approach offers practical benefits for controllable 3D asset creation and provides insights into mitigating many-to-one mappings in autodecoded latent spaces through targeted regularization and initialization strategies.

Abstract

Controllable generation of 3D assets is important for many practical applications like content creation in movies, games and engineering, as well as in AR/VR. Recently, diffusion models have shown remarkable results in generation quality of 3D objects. However, none of the existing models enable disentangled generation to control the shape and appearance separately. For the first time, we present a suitable representation for 3D diffusion models to enable such disentanglement by introducing a hybrid point cloud and neural radiance field approach. We model a diffusion process over point positions jointly with a high-dimensional feature space for a local density and radiance decoder. While the point positions represent the coarse shape of the object, the point features allow modeling the geometry and appearance details. This disentanglement enables us to sample both independently and therefore to control both separately. Our approach sets a new state of the art in generation compared to previous disentanglement-capable methods by reduced FID scores of 30-90% and is on-par with other non disentanglement-capable state-of-the art methods.

Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation

TL;DR

This work introduces Neural Point Cloud Diffusion (NPCD), a diffusion-based framework that operates on a hybrid neural point cloud and radiance field to disentangle 3D shape from appearance. By training a category-level Point-NeRF autodecoder, NPCD yields high-quality samples where geometry and texture can be sampled independently, enabling explicit shape or appearance control. Empirical results show state-of-the-art disentangled generation compared to GAN-based baselines and competitive performance against other diffusion methods that do not support disentanglement, with thorough ablations on initialization, feature dimensionality, and regularization. The approach offers practical benefits for controllable 3D asset creation and provides insights into mitigating many-to-one mappings in autodecoded latent spaces through targeted regularization and initialization strategies.

Abstract

Controllable generation of 3D assets is important for many practical applications like content creation in movies, games and engineering, as well as in AR/VR. Recently, diffusion models have shown remarkable results in generation quality of 3D objects. However, none of the existing models enable disentangled generation to control the shape and appearance separately. For the first time, we present a suitable representation for 3D diffusion models to enable such disentanglement by introducing a hybrid point cloud and neural radiance field approach. We model a diffusion process over point positions jointly with a high-dimensional feature space for a local density and radiance decoder. While the point positions represent the coarse shape of the object, the point features allow modeling the geometry and appearance details. This disentanglement enables us to sample both independently and therefore to control both separately. Our approach sets a new state of the art in generation compared to previous disentanglement-capable methods by reduced FID scores of 30-90% and is on-par with other non disentanglement-capable state-of-the art methods.
Paper Structure (66 sections, 9 equations, 14 figures, 5 tables, 1 algorithm)

This paper contains 66 sections, 9 equations, 14 figures, 5 tables, 1 algorithm.

Figures (14)

  • Figure 1: We present a method to model 3D radiance field distributions using neural point denoising diffusion (left). Since our representation disentangles coarse object shape from local appearance, we can sample from the individual distributions separately (right).
  • Figure 2: Overview of neural point cloud diffusion (NCPD). In the center we have a neural point cloud representation, where each point has a position ($\blacksquare$) and an appearance feature ($\blacksquare$). The neural point cloud can be generated with a diffusion model (top) and can be rendered via ray integration (bottom).
  • Figure 3: Qualitative examples of disentangled generation on SRN cars, SRN chairs, PhotoShape chairs. \ref{['subfig:DisentanglementAppearanceOnly']}Appearance-only generation: we show a generated object and objects with re-sampled appearance. \ref{['subfig:DisentanglementShapeOnly']}Shape-only generation: we show a generated object and objects with re-sampled coarse shape. We can get diverse samples of local appearance or coarse shape when the respective other is given.
  • Figure 4: Comparison against previous generative models that allow disentangled generation.: While we present the first diffusion model allowing disentangled generation, earlier works are GAN-based. It can be seen that our model generates examples in much higher quality, as also evident from the metrics in Tab. \ref{['tab:DisnentangledComparison']}.
  • Figure A1: Visualization of the neural point cloud diffusion process. We generate the shape and appearance of 3D objects on ShapeNet Cars, ShapeNet Chairs, and PhotoShape Chairs with the proposed Neural Point Cloud Diffusion (NPCD) model. We visualize the neural point clouds $\mathcal{P}_t = (\mathbf{P}_t, \mathbf{F}_t)$ from intermediate timesteps $t$ of the diffusion process. In total, the diffusion process of NPCD has 1000 timesteps and we visualize every 100th timestep. The features of the neural point clouds are visualized by taking the first three PCA components as RGB color. The last visualized neural point cloud $\mathcal{P}_0$ represents the final generated 3D object. Additionally, we visualize a Point-NeRF rendering of the final neural point cloud.
  • ...and 9 more figures