Table of Contents
Fetching ...

ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting

Shen Chen, Jiale Zhou, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Lei Li

TL;DR

The paper tackles the challenge of generating high-quality 3D assets from a single image by balancing texture detail with geometric consistency. It introduces ScalingGaussian, which couples 3D Gaussian Splatting with 2D diffusion via Score Distillation Sampling, complemented by a densification pipeline consisting of a scaling module and a perturbation module. The method initializes a richer set of 3D Gaussians from sparse diffusion outputs, guides them with SDS to clone and split, converts the result to meshes, and refines textures using MSE and Gradient Profile Prior losses. This framework enables fast, single-GPU image-to-3D generation with strong geometric structure and detailed textures, demonstrated on Deep Fashion3D and related datasets, while noting limitations in depth integration and large-scale scene handling for future work.

Abstract

The creation of high-quality 3D assets is paramount for applications in digital heritage preservation, entertainment, and robotics. Traditionally, this process necessitates skilled professionals and specialized software for the modeling, texturing, and rendering of 3D objects. However, the rising demand for 3D assets in gaming and virtual reality (VR) has led to the creation of accessible image-to-3D technologies, allowing non-professionals to produce 3D content and decreasing dependence on expert input. Existing methods for 3D content generation struggle to simultaneously achieve detailed textures and strong geometric consistency. We introduce a novel 3D content creation framework, ScalingGaussian, which combines 3D and 2D diffusion models to achieve detailed textures and geometric consistency in generated 3D assets. Initially, a 3D diffusion model generates point clouds, which are then densified through a process of selecting local regions, introducing Gaussian noise, followed by using local density-weighted selection. To refine the 3D gaussians, we utilize a 2D diffusion model with Score Distillation Sampling (SDS) loss, guiding the 3D Gaussians to clone and split. Finally, the 3D Gaussians are converted into meshes, and the surface textures are optimized using Mean Square Error(MSE) and Gradient Profile Prior(GPP) losses. Our method addresses the common issue of sparse point clouds in 3D diffusion, resulting in improved geometric structure and detailed textures. Experiments on image-to-3D tasks demonstrate that our approach efficiently generates high-quality 3D assets.

ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting

TL;DR

The paper tackles the challenge of generating high-quality 3D assets from a single image by balancing texture detail with geometric consistency. It introduces ScalingGaussian, which couples 3D Gaussian Splatting with 2D diffusion via Score Distillation Sampling, complemented by a densification pipeline consisting of a scaling module and a perturbation module. The method initializes a richer set of 3D Gaussians from sparse diffusion outputs, guides them with SDS to clone and split, converts the result to meshes, and refines textures using MSE and Gradient Profile Prior losses. This framework enables fast, single-GPU image-to-3D generation with strong geometric structure and detailed textures, demonstrated on Deep Fashion3D and related datasets, while noting limitations in depth integration and large-scale scene handling for future work.

Abstract

The creation of high-quality 3D assets is paramount for applications in digital heritage preservation, entertainment, and robotics. Traditionally, this process necessitates skilled professionals and specialized software for the modeling, texturing, and rendering of 3D objects. However, the rising demand for 3D assets in gaming and virtual reality (VR) has led to the creation of accessible image-to-3D technologies, allowing non-professionals to produce 3D content and decreasing dependence on expert input. Existing methods for 3D content generation struggle to simultaneously achieve detailed textures and strong geometric consistency. We introduce a novel 3D content creation framework, ScalingGaussian, which combines 3D and 2D diffusion models to achieve detailed textures and geometric consistency in generated 3D assets. Initially, a 3D diffusion model generates point clouds, which are then densified through a process of selecting local regions, introducing Gaussian noise, followed by using local density-weighted selection. To refine the 3D gaussians, we utilize a 2D diffusion model with Score Distillation Sampling (SDS) loss, guiding the 3D Gaussians to clone and split. Finally, the 3D Gaussians are converted into meshes, and the surface textures are optimized using Mean Square Error(MSE) and Gradient Profile Prior(GPP) losses. Our method addresses the common issue of sparse point clouds in 3D diffusion, resulting in improved geometric structure and detailed textures. Experiments on image-to-3D tasks demonstrate that our approach efficiently generates high-quality 3D assets.
Paper Structure (29 sections, 31 equations, 9 figures, 3 tables)

This paper contains 29 sections, 31 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: We propose a 3D Content Creation framework, called ScalingGaussian, which efficiently optimizes 3D Gaussian Splatting through 2D and 3D diffusion produces 3D assets with rich details and consistent structure.
  • Figure 2: Overall framework. First, the 3D diffusion model generates the initial point clouds. Then, scaling and perturbation modules are applied to these point clouds to enhance the structural characteristics of the object. Scaling points are used to create initialized 3D Gaussians, which are then optimized using SDS loss in conjunction with the 2D diffusion model. Finally, images are rendered through 3D Gaussian Splatting and a textured mesh is extracted and refined using the 2D diffusion model, where MSE loss and GPP loss are applied to enhance texture details.
  • Figure 3: Scaling points based on the intrinsic characteristics of the object structure. The point cloud is divided into multiple local regions, with each region processed separately to enhance point density and color retention. Initially, new points are uniformly generated within each region, and Gaussian noise is added to the colors of the nearest neighbor points. Subsequently, a density function is constructed to retain points based on their local density.
  • Figure 4: Gradient field and Gradient Profile Prior Loss. The gradient field of a clear image is sharper than that of a blurred image.
  • Figure 5: Qualitative comparisons of the generative quality on Deep Fashion3D.
  • ...and 4 more figures