GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

Trapoom Ukarapol; Kevin Pruvost

GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

Trapoom Ukarapol, Kevin Pruvost

TL;DR

GradeADreamer addresses key challenges in text-to-3D generation, notably view-inconsistency and long convergence times. It proposes a three-stage pipeline that generates a Gaussian Splats prior with a Multi-view Diffusion Model, refines geometry with StableDiffusion, and performs texture optimization on a mesh guided by diffusion models while using SDS for all stages. Across qualitative and quantitative evaluations, GradeADreamer achieves the highest average user ranking, the lowest 3D-FID, and a substantially reduced incidence of Multi-face Janus compared with prior methods, while running on a single RTX 3090 in about 30 minutes per asset. The work offers a practical, efficient route to high-quality 3D assets and highlights effective combinations of diffusion priors and texture optimization for text-to-3D generation.

Abstract

Text-to-3D generation has shown promising results, yet common challenges such as the Multi-face Janus problem and extended generation time for high-quality assets. In this paper, we address these issues by introducing a novel three-stage training pipeline called GradeADreamer. This pipeline is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU. Our proposed method employs a Multi-view Diffusion Model, MVDream, to generate Gaussian Splats as a prior, followed by refining geometry and texture using StableDiffusion. Experimental results demonstrate that our approach significantly mitigates the Multi-face Janus problem and achieves the highest average user preference ranking compared to previous state-of-the-art methods. The project code is available at https://github.com/trapoom555/GradeADreamer.

GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

TL;DR

Abstract

Paper Structure (36 sections, 1 equation, 8 figures, 5 tables)

This paper contains 36 sections, 1 equation, 8 figures, 5 tables.

Introduction
Related Works
3D Representations
Score Distillation Methods
Mesh Extraction from Gaussian Splatting
Approach
3D Representation Choices
Training Stages
Gaussian Splats Prior Generation
Gaussian Splats Refinement
Texture Optimization
Score Distillation Choices
Experiments
Implementation Details
Training Resources
...and 21 more sections

Figures (8)

Figure 1: High-quality assets generated by GradeADreamer
Figure 2: Examples of Multi-Face Janus Problem wiki:janus (Generated with ProlificDreamer wang2023prolificdreamer)
Figure 3: Overview of GradeADreamer. The proposed method consists of three optimization steps. The first step involves optimizing random Gaussian Splats using MVDream to obtain a Gaussian Splats prior (see Section \ref{['approach:stage1']}). In the second step, this prior is refined using StableDiffusion (see Section \ref{['approach:stage2']}). Finally, the third step employs texture optimization on a mesh, guided by StableDiffusion (see Section \ref{['approach:stage3']}).
Figure 4: Qualitative results
Figure 5: Rank distribution comparison from user study
...and 3 more figures

GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

TL;DR

Abstract

GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

Authors

TL;DR

Abstract

Table of Contents

Figures (8)