ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting
Shen Chen, Jiale Zhou, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Lei Li
TL;DR
The paper tackles the challenge of generating high-quality 3D assets from a single image by balancing texture detail with geometric consistency. It introduces ScalingGaussian, which couples 3D Gaussian Splatting with 2D diffusion via Score Distillation Sampling, complemented by a densification pipeline consisting of a scaling module and a perturbation module. The method initializes a richer set of 3D Gaussians from sparse diffusion outputs, guides them with SDS to clone and split, converts the result to meshes, and refines textures using MSE and Gradient Profile Prior losses. This framework enables fast, single-GPU image-to-3D generation with strong geometric structure and detailed textures, demonstrated on Deep Fashion3D and related datasets, while noting limitations in depth integration and large-scale scene handling for future work.
Abstract
The creation of high-quality 3D assets is paramount for applications in digital heritage preservation, entertainment, and robotics. Traditionally, this process necessitates skilled professionals and specialized software for the modeling, texturing, and rendering of 3D objects. However, the rising demand for 3D assets in gaming and virtual reality (VR) has led to the creation of accessible image-to-3D technologies, allowing non-professionals to produce 3D content and decreasing dependence on expert input. Existing methods for 3D content generation struggle to simultaneously achieve detailed textures and strong geometric consistency. We introduce a novel 3D content creation framework, ScalingGaussian, which combines 3D and 2D diffusion models to achieve detailed textures and geometric consistency in generated 3D assets. Initially, a 3D diffusion model generates point clouds, which are then densified through a process of selecting local regions, introducing Gaussian noise, followed by using local density-weighted selection. To refine the 3D gaussians, we utilize a 2D diffusion model with Score Distillation Sampling (SDS) loss, guiding the 3D Gaussians to clone and split. Finally, the 3D Gaussians are converted into meshes, and the surface textures are optimized using Mean Square Error(MSE) and Gradient Profile Prior(GPP) losses. Our method addresses the common issue of sparse point clouds in 3D diffusion, resulting in improved geometric structure and detailed textures. Experiments on image-to-3D tasks demonstrate that our approach efficiently generates high-quality 3D assets.
