Efficient and Scalable Point Cloud Generation with Sparse Point-Voxel Diffusion Models
Ioannis Romanelis, Vlassios Fotis, Athanasios Kalogeras, Christos Alexakos, Konstantinos Moustakas, Adrian Munteanu
TL;DR
This work introduces SPVD, a sparse point‑voxel diffusion U‑Net that jointly leverages a high‑fidelity point branch and a sparse voxel backbone to enable efficient and scalable 3D point‑cloud generation. By GPU‑based voxelization and a graph‑structured integration of time embeddings, SPVD achieves faster generation than prior diffusion models while attaining state‑of‑the‑art results among diffusion methods on ShapeNet, and it scales to conditional generation across categories, implicit generation with fewer timesteps, as well as completion and super‑resolution tasks. The approach demonstrates robust performance across unconditional and conditional generation, completing partial shapes and upsampling point density, making diffusion‑based 3D generation more practical. The work also provides a public implementation and discusses potential future directions, including latent diffusion pipelines and guidance, to broaden applicability to larger datasets like Objaverse and beyond.
Abstract
We propose a novel point cloud U-Net diffusion architecture for 3D generative modeling capable of generating high-quality and diverse 3D shapes while maintaining fast generation times. Our network employs a dual-branch architecture, combining the high-resolution representations of points with the computational efficiency of sparse voxels. Our fastest variant outperforms all non-diffusion generative approaches on unconditional shape generation, the most popular benchmark for evaluating point cloud generative models, while our largest model achieves state-of-the-art results among diffusion methods, with a runtime approximately 70% of the previously state-of-the-art PVD. Beyond unconditional generation, we perform extensive evaluations, including conditional generation on all categories of ShapeNet, demonstrating the scalability of our model to larger datasets, and implicit generation which allows our network to produce high quality point clouds on fewer timesteps, further decreasing the generation time. Finally, we evaluate the architecture's performance in point cloud completion and super-resolution. Our model excels in all tasks, establishing it as a state-of-the-art diffusion U-Net for point cloud generative modeling. The code is publicly available at https://github.com/JohnRomanelis/SPVD.git.
