Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape
Rundi Wu, Ruoshi Liu, Carl Vondrick, Changxi Zheng
TL;DR
Sin3DM addresses single-instance 3D textured shape generation by learning a diffusion model in a compact triplane latent space, derived from a surface SDF $d(p)$ and texture $c(p)$. It trains a small-receptive-field denoiser with triplane-aware convolutions to capture patch-level variations while preserving global structure, and decodes samples into textured meshes via marching cubes and texture mapping. Compared to baselines, it achieves higher geometry and texture quality and enables practical capabilities such as retargeting, outpainting, and PBR material support, all with memory-efficient diffusion in latent space. This approach offers a practical path for high-quality 3D asset generation from a single exemplar, suitable for rapid content creation and editing in modern pipelines.
Abstract
Synthesizing novel 3D models that resemble the input example has long been pursued by graphics artists and machine learning researchers. In this paper, we present Sin3DM, a diffusion model that learns the internal patch distribution from a single 3D textured shape and generates high-quality variations with fine geometry and texture details. Training a diffusion model directly in 3D would induce large memory and computational cost. Therefore, we first compress the input into a lower-dimensional latent space and then train a diffusion model on it. Specifically, we encode the input 3D textured shape into triplane feature maps that represent the signed distance and texture fields of the input. The denoising network of our diffusion model has a limited receptive field to avoid overfitting, and uses triplane-aware 2D convolution blocks to improve the result quality. Aside from randomly generating new samples, our model also facilitates applications such as retargeting, outpainting and local editing. Through extensive qualitative and quantitative evaluation, we show that our method outperforms prior methods in generation quality of 3D shapes.
