Generating Surface for Text-to-3D using 2D Gaussian Splatting
Huanning Dong, Fan Li, Ping Kuang, Jianwen Min
TL;DR
This work introduces DirectGaussian, a surfel-based Text-to-3D framework that renders object surfaces using 2D Gaussian splatting guided by text-conditioned multi-view priors. It constructs a Gaussian surfel dataset (TextGaussian) and learns coarse surfels via a multi-head attention model, then refines them through a four-view optimization that enforces texture/normal consistency and global geometric coherence with curvature and convergence constraints. The method achieves diverse, high-fidelity textured 3D surfaces and shows robustness to unseen viewpoints, outperforming several Gaussian-splatting baselines in qualitative quality and user preference, while maintaining efficient rendering. By integrating 360° surface curvature constraints with diffusion-based priors, DirectGaussian enables practical text-to-3D content creation suitable for animation, VR, and game pipelines.
Abstract
Recent advancements in Text-to-3D modeling have shown significant potential for the creation of 3D content. However, due to the complex geometric shapes of objects in the natural world, generating 3D content remains a challenging task. Current methods either leverage 2D diffusion priors to recover 3D geometry, or train the model directly based on specific 3D representations. In this paper, we propose a novel method named DirectGaussian, which focuses on generating the surfaces of 3D objects represented by surfels. In DirectGaussian, we utilize conditional text generation models and the surface of a 3D object is rendered by 2D Gaussian splatting with multi-view normal and texture priors. For multi-view geometric consistency problems, DirectGaussian incorporates curvature constraints on the generated surface during optimization process. Through extensive experiments, we demonstrate that our framework is capable of achieving diverse and high-fidelity 3D content creation.
