LDM: Large Tensorial SDF Model for Textured Mesh Generation
Rengan Xie, Wenting Zheng, Kai Huang, Yizheng Chen, Qi Wang, Qi Ye, Wei Chen, Yuchi Huo
TL;DR
This work tackles fast, high-quality 3D asset generation from text or a single image without per-object optimization. It introduces LDM, a feed-forward pipeline that uses conditional multi-view diffusion to generate four-view inputs and a transformer-based tensorial SDF reconstructor to produce a unified tensorial SDF field, followed by a gradient-based mesh refinement. The method represents geometry and appearance with a shared tensorial SDF and decouples color into albedo and shading, enabling reliable relighting and material editing. A two-stage training regime—volume rendering for global features and FlexiCube-based local refinement—yields high-quality textured meshes in seconds and outperforms prior methods on color and geometry metrics.
Abstract
Previous efforts have managed to generate production-ready 3D assets from text or images. However, these methods primarily employ NeRF or 3D Gaussian representations, which are not adept at producing smooth, high-quality geometries required by modern rendering pipelines. In this paper, we propose LDM, a novel feed-forward framework capable of generating high-fidelity, illumination-decoupled textured mesh from a single image or text prompts. We firstly utilize a multi-view diffusion model to generate sparse multi-view inputs from single images or text prompts, and then a transformer-based model is trained to predict a tensorial SDF field from these sparse multi-view image inputs. Finally, we employ a gradient-based mesh optimization layer to refine this model, enabling it to produce an SDF field from which high-quality textured meshes can be extracted. Extensive experiments demonstrate that our method can generate diverse, high-quality 3D mesh assets with corresponding decomposed RGB textures within seconds.
