Table of Contents
Fetching ...

GVGEN: Text-to-3D Generation with Volumetric Representation

Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He

TL;DR

GVGEN tackles text-to-3D generation by directly producing explicit 3D Gaussians through a two-stage, coarse-to-fine approach. It first fits a fixed-size GaussianVolume using a Candidate Pool Strategy to maintain structure while allowing refinement, then generates a GaussianVolume via a GDF-conditioned diffusion followed by a 3D U-Net predictor. The method achieves competitive qualitative and quantitative results with a fast ~7-second generation time, thanks to the structured Gaussian representation and staged generation. Acknowledging limitations in domain coverage and fixed volume resolution, GVGEN lays groundwork for fast, controllable 3D content creation using Gaussians. The work suggests promising extensions to broader prompts and higher-fidelity textures with scalable volumes.

Abstract

In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed ($\sim$7 seconds), effectively striking a balance between quality and efficiency. Our project page is: https://gvgen.github.io/

GVGEN: Text-to-3D Generation with Volumetric Representation

TL;DR

GVGEN tackles text-to-3D generation by directly producing explicit 3D Gaussians through a two-stage, coarse-to-fine approach. It first fits a fixed-size GaussianVolume using a Candidate Pool Strategy to maintain structure while allowing refinement, then generates a GaussianVolume via a GDF-conditioned diffusion followed by a 3D U-Net predictor. The method achieves competitive qualitative and quantitative results with a fast ~7-second generation time, thanks to the structured Gaussian representation and staged generation. Acknowledging limitations in domain coverage and fixed volume resolution, GVGEN lays groundwork for fast, controllable 3D content creation using Gaussians. The work suggests promising extensions to broader prompts and higher-fidelity textures with scalable volumes.

Abstract

In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed (7 seconds), effectively striking a balance between quality and efficiency. Our project page is: https://gvgen.github.io/
Paper Structure (26 sections, 6 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 26 sections, 6 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of GVGEN. Our framework comprises two stages. In the data pre-processing phase, we fit GaussianVolumes (\ref{['sec:3dgvf']}) and extract coarse geometry Gaussian Distance Field (GDF) as training data. For the generation stage (\ref{['sec:generation']}), we first generate GDF via a diffusion model, and then send it into a 3D U-Net to predict attributes of GaussianVolumes.
  • Figure 2: Illustration of GaussianVolume Fitting. We organize a fixed number of 3D Gaussians in a volumetric form, termed GaussianVolume. By using position offsets to express slight movements from grid points to Gaussian centers, we can capture the details of objects. The proposed Candidate Pool Strategy (CPS) (\ref{['sec:3dgvf']}) enables effective pruning and densification with a pool storing pruned points.
  • Figure 3: Visualization of GaussianVolume Fitting. The rendering results demonstrate excellent reconstruction performance of GaussianVolumes.
  • Figure 4: Comparisons with State-of-the-art Text-to-3D Methods. Our method achieves competitive visual results with better alignments with text conditions.
  • Figure 5: Text-to-3D Generation Results by GVGEN.
  • ...and 2 more figures