Table of Contents
Fetching ...

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Haoran Li, Haolin Shi, Wenli Zhang, Wenjun Wu, Yong Liao, Lin Wang, Lik-hang Lee, Pengyuan Zhou

TL;DR

DreamScene addresses the bottlenecks of quality, consistency, and editing flexibility in text-to-3D scene generation by introducing Formation Pattern Sampling (FPS) over 3D Gaussians. FPS combines multi-timestep sampling, 3D Gaussian filtering, and reconstructive generation to rapidly form semantically rich, surface-focused geometries, while a progressive three-stage camera strategy ensures scene-wide 3D consistency for indoor and outdoor settings. The approach leverages DDPM/DSD-based priors and 3D Gaussian splatting to enable fast, high-quality rendering with editable object-environment integration. Experimental results show DreamScene outperforms state-of-the-art baselines in both generation quality and editing flexibility, delivering scenes with up to 20 objects more efficiently, thereby broadening potential applications in gaming, film, and architectural design.

Abstract

Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io .

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

TL;DR

DreamScene addresses the bottlenecks of quality, consistency, and editing flexibility in text-to-3D scene generation by introducing Formation Pattern Sampling (FPS) over 3D Gaussians. FPS combines multi-timestep sampling, 3D Gaussian filtering, and reconstructive generation to rapidly form semantically rich, surface-focused geometries, while a progressive three-stage camera strategy ensures scene-wide 3D consistency for indoor and outdoor settings. The approach leverages DDPM/DSD-based priors and 3D Gaussian splatting to enable fast, high-quality rendering with editable object-environment integration. Experimental results show DreamScene outperforms state-of-the-art baselines in both generation quality and editing flexibility, delivering scenes with up to 20 objects more efficiently, thereby broadening potential applications in gaming, film, and architectural design.

Abstract

Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io .
Paper Structure (27 sections, 11 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 27 sections, 11 equations, 13 figures, 1 table, 1 algorithm.

Figures (13)

  • Figure 1: DreamScene compared with current SOTA text-to-3D scene generation methods. Text2NeRF zhang2024text2nerf, Text2Room hollein2023text2room, and ProlificDreamer wang2024prolificdreamer require $7\sim 12$ hours to generate the scene while DreamScene only needs 1 hour. Moreover, DreamScene is capable of generating scenes that accommodate up to 20 objects as shown in later figures.
  • Figure 2: The overview of DreamScene. We primarily employ Formation Pattern Sampling, which includes multi-timestep sampling, 3D Gaussian filtering, and reconstructive generation to rapidly produce high-quality and semantically rich 3D representations with plausible textures and low storage demands. Additionally, DreamScene ensures scene-wide consistency through camera sampling and allows for flexible editing by integrating objects with the environments in the scene.
  • Figure 3: Formation Pattern Sampling.
  • Figure 4: Comparison with baselines in text-to-3D generation.
  • Figure 5: Consistency results under multiple scene-wide camera poses.
  • ...and 8 more figures