Table of Contents
Fetching ...

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

Yueming Zhao, Xuening Yuan, Hongyu Yang, Di Huang

TL;DR

DreamScape addresses the challenge of text-to-3D scene generation with multiple objects by introducing a Gaussian Splatting–based pipeline guided by a 3D Gaussian Guide ($3{DG^2}$) derived from large language models. The method combines local object-focused optimization with global scene alignment, using progressive scale control and a collision-aware loss to ensure realism and consistency, while handling pervasive elements through sparse initialization and densification. Empirical results show state-of-the-art performance in fidelity and multi-view coherence, with editing capabilities and robust scene interactions. This approach enables high-quality, controllable 3D scene generation from text, advancing the practical deployment of text-driven 3D content creation.

Abstract

Recent advances in text-to-3D creation integrate the potent prior of Diffusion Models from text-to-image generation into 3D domain. Nevertheless, generating 3D scenes with multiple objects remains challenging. Therefore, we present DreamScape, a method for generating 3D scenes from text. Utilizing Gaussian Splatting for 3D representation, DreamScape introduces 3D Gaussian Guide that encodes semantic primitives, spatial transformations and relationships from text using LLMs, enabling local-to-global optimization. Progressive scale control is tailored during local object generation, addressing training instability issue arising from simple blending in the global optimization stage. Collision relationships between objects are modeled at the global level to mitigate biases in LLMs priors, ensuring physical correctness. Additionally, to generate pervasive objects like rain and snow distributed extensively across the scene, we design specialized sparse initialization and densification strategy. Experiments demonstrate that DreamScape achieves state-of-the-art performance, enabling high-fidelity, controllable 3D scene generation.

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

TL;DR

DreamScape addresses the challenge of text-to-3D scene generation with multiple objects by introducing a Gaussian Splatting–based pipeline guided by a 3D Gaussian Guide () derived from large language models. The method combines local object-focused optimization with global scene alignment, using progressive scale control and a collision-aware loss to ensure realism and consistency, while handling pervasive elements through sparse initialization and densification. Empirical results show state-of-the-art performance in fidelity and multi-view coherence, with editing capabilities and robust scene interactions. This approach enables high-quality, controllable 3D scene generation from text, advancing the practical deployment of text-driven 3D content creation.

Abstract

Recent advances in text-to-3D creation integrate the potent prior of Diffusion Models from text-to-image generation into 3D domain. Nevertheless, generating 3D scenes with multiple objects remains challenging. Therefore, we present DreamScape, a method for generating 3D scenes from text. Utilizing Gaussian Splatting for 3D representation, DreamScape introduces 3D Gaussian Guide that encodes semantic primitives, spatial transformations and relationships from text using LLMs, enabling local-to-global optimization. Progressive scale control is tailored during local object generation, addressing training instability issue arising from simple blending in the global optimization stage. Collision relationships between objects are modeled at the global level to mitigate biases in LLMs priors, ensuring physical correctness. Additionally, to generate pervasive objects like rain and snow distributed extensively across the scene, we design specialized sparse initialization and densification strategy. Experiments demonstrate that DreamScape achieves state-of-the-art performance, enabling high-fidelity, controllable 3D scene generation.
Paper Structure (14 sections, 10 equations, 5 figures, 2 tables)

This paper contains 14 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: DreamScape leverages Diffusion Models and LLMs to generate detailed, realistic, and multi-angle consistent scenes from text descriptions, demonstrating strong modeling across various scene types. This figure shows multi-view RGB images and depth maps generated by DreamScape.
  • Figure 2: Overview of our method: (a) Given a text prompt, DreamScape uses $3{DG^2}$ generated by LLMs to interpret the scene, guiding local-global training with a frozen diffusion prior. (b) The local step generates detailed objects using progressive scale control, and sparse initialization and densification for pervasive objects. (c) In the global step, objects are aligned in a unified coordinate system via $3{DG^2}$, refined based on perspectives, ensuring detailed textures and consistent interactions with collision loss of $3{DG^2}$.
  • Figure 3: Qualitative comparisons with state-of-the-art text-to-3D generation methods.
  • Figure 4: Visualization of ablation experiments. We have carried out ablation experiments on several modules in DreamScape and proved their effectiveness.
  • Figure 5: Demonstration of editing ability. DreamScape can edit the generated results in real time, including position transformation, scaling, rotation, etc.