Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions
Gene Chou, Yuval Bahat, Felix Heide
TL;DR
<3-5 sentence high-level summary> Diffusion-SDF reframes 3D shape generation as diffusion in the latent space of neural signed distance functions (SDFs), enabling both unconditional generation and conditional shape completion from partial point clouds, real scans, and 2D images. The method introduces a modulation module that compresses SDFs into latent vectors via a PointNet–VAE pair, followed by a diffusion model that denoises these latents; conditioning is achieved through encoders and cross-attention, allowing multi-modal guidance. End-to-end training with geometry-consistency losses couples the SDF, latent, and diffusion components to produce plausible, diverse surfaces, and experiments show strong performance on unconditional and conditional tasks across large-scale datasets. The work demonstrates scalable 3D generation in implicit representations and opens avenues for multi-modal, text-to-shape and scene-level synthesis.
Abstract
Probabilistic diffusion models have achieved state-of-the-art results for image synthesis, inpainting, and text-to-image tasks. However, they are still in the early stages of generating complex 3D shapes. This work proposes Diffusion-SDF, a generative model for shape completion, single-view reconstruction, and reconstruction of real-scanned point clouds. We use neural signed distance functions (SDFs) as our 3D representation to parameterize the geometry of various signals (e.g., point clouds, 2D images) through neural networks. Neural SDFs are implicit functions and diffusing them amounts to learning the reversal of their neural network weights, which we solve using a custom modulation module. Extensive experiments show that our method is capable of both realistic unconditional generation and conditional generation from partial inputs. This work expands the domain of diffusion models from learning 2D, explicit representations, to 3D, implicit representations.
