NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan Ji
TL;DR
NeuSDFusion addresses the challenge of producing diverse, high-fidelity 3D shapes with spatially coherent structure under memory constraints. It introduces NeuSDF, a hybrid representation that encodes objects on three orthogonal planes, and a transformer-based spatial-aware autoencoder to compress these planes into latent tri-planes. A latent diffusion model then generates these tri-planes under multimodal conditioning (text, images, or point clouds) and decodes them into dense SDFs for marching cubes reconstruction. Across unconditional generation, multi-modal completion, single-view reconstruction, and language-guided generation, the approach achieves state-of-the-art results, highlighting improved quality, diversity, and efficiency for 3D generation tasks.
Abstract
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation without considering spatial consistency. As a result, these approaches exhibit limited versatility in 3D data representation and shape generation, hindering their ability to generate highly diverse 3D shapes that comply with the specified constraints. In this paper, we introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling. To ensure spatial coherence and reduce memory usage, we incorporate a hybrid shape representation technique that directly learns a continuous signed distance field representation of the 3D shape using orthogonal 2D planes. Additionally, we meticulously enforce spatial correspondences across distinct planes using a transformer-based autoencoder structure, promoting the preservation of spatial relationships in the generated 3D shapes. This yields an algorithm that consistently outperforms state-of-the-art 3D shape generation methods on various tasks, including unconditional shape generation, multi-modal shape completion, single-view reconstruction, and text-to-shape synthesis. Our project page is available at https://weizheliu.github.io/NeuSDFusion/ .
