Table of Contents
Fetching ...

SceneCraft: Layout-Guided 3D Scene Generation

Xiuyu Yang, Yunze Man, Jun-Kun Chen, Yu-Xiong Wang

TL;DR

SceneCraft addresses the challenge of text-and-layout-guided 3D indoor scene generation by introducing a rendering-based pipeline that converts 3D semantic layouts into multi-view 2D proxy maps and learns a NeRF-based final scene representation. The approach comprises a user-friendly Bounding-Box Scene (BBS) layout interface, a 2D diffusion model SceneCraft2D conditioned on bounding-box images, and a distillation-based 3D synthesis stage with annealing, a layout-aware depth constraint, and texture consolidation. It supports free camera trajectories and complex multi-room layouts beyond single rooms, achieving state-of-the-art 3D indoor scene generation on datasets like ScanNet++ and Hypersim. The results show improved geometric consistency, diverse textures, and realistic visual quality, with code and results available from the authors.

Abstract

The creation of complex 3D scenes tailored to user specifications has been a tedious and challenging task with traditional 3D modeling tools. Although some pioneering methods have achieved automatic text-to-3D generation, they are generally limited to small-scale scenes with restricted control over the shape and texture. We introduce SceneCraft, a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences provided by users. Central to our method is a rendering-based technique, which converts 3D semantic layouts into multi-view 2D proxy maps. Furthermore, we design a semantic and depth conditioned diffusion model to generate multi-view images, which are used to learn a neural radiance field (NeRF) as the final scene representation. Without the constraints of panorama image generation, we surpass previous methods in supporting complicated indoor space generation beyond a single room, even as complicated as a whole multi-bedroom apartment with irregular shapes and layouts. Through experimental analysis, we demonstrate that our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality. Code and more results are available at: https://orangesodahub.github.io/SceneCraft

SceneCraft: Layout-Guided 3D Scene Generation

TL;DR

SceneCraft addresses the challenge of text-and-layout-guided 3D indoor scene generation by introducing a rendering-based pipeline that converts 3D semantic layouts into multi-view 2D proxy maps and learns a NeRF-based final scene representation. The approach comprises a user-friendly Bounding-Box Scene (BBS) layout interface, a 2D diffusion model SceneCraft2D conditioned on bounding-box images, and a distillation-based 3D synthesis stage with annealing, a layout-aware depth constraint, and texture consolidation. It supports free camera trajectories and complex multi-room layouts beyond single rooms, achieving state-of-the-art 3D indoor scene generation on datasets like ScanNet++ and Hypersim. The results show improved geometric consistency, diverse textures, and realistic visual quality, with code and results available from the authors.

Abstract

The creation of complex 3D scenes tailored to user specifications has been a tedious and challenging task with traditional 3D modeling tools. Although some pioneering methods have achieved automatic text-to-3D generation, they are generally limited to small-scale scenes with restricted control over the shape and texture. We introduce SceneCraft, a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences provided by users. Central to our method is a rendering-based technique, which converts 3D semantic layouts into multi-view 2D proxy maps. Furthermore, we design a semantic and depth conditioned diffusion model to generate multi-view images, which are used to learn a neural radiance field (NeRF) as the final scene representation. Without the constraints of panorama image generation, we surpass previous methods in supporting complicated indoor space generation beyond a single room, even as complicated as a whole multi-bedroom apartment with irregular shapes and layouts. Through experimental analysis, we demonstrate that our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality. Code and more results are available at: https://orangesodahub.github.io/SceneCraft

Paper Structure

This paper contains 18 sections, 1 equation, 13 figures, 1 table.

Figures (13)

  • Figure 1: Our novel method generates complex and detailed indoor scenes from 3D spatial layouts and textual descriptions. Given user-specified layouts represented as a "Bounding Box Scene (BBS)," our method renders batches of 2D layouts and coarse depth maps and then transforms them into high-quality 3D scenes.
  • Figure 2: SceneCraft is a novel framework for layout-guided scene generation, which allows users to provide the layout as a bounding-box scene (BBS, Sec. \ref{['sec:method:bbs']}), a user-friendly layout format that guides the generation. Our framework contains two stages: (a) pre-training of a 2D diffusion model, SceneCraft2D, to solve the 2D version of the layout-guided scene generation task (Sec. \ref{['sec:method:2d']}), and (b) distillation of the SceneCraft2D to learn a scene representation of the generated scene (Sec. \ref{['sec:method:distill']}).
  • Figure 3: Generation results of SceneCraft on Hypersim hypersim provided room layouts. For each sample, we demonstrate the 3D BBS and BBI semantic maps and the generated scene RGB images and rendered depth map. Our method is able to generate complex and free-form scenes from challenging room layouts.
  • Figure 4: Qualitative comparisons of SceneCraft and baseline approaches. We show our generated color and depth renderings under two common layout conditions (a bedroom and a living room) alongside three other baselines. SceneCraft demonstrates higher credibility in following the layout conditions and is capable of handling more complex scenarios.
  • Figure 5: Generation results of SceneCraft in complex scenes. We demonstrate SceneCraft's ability to generate more complex indoor scenes leveraging arbitrary camera trajectories. Such non-regular shape of rooms cannot be naturally achieved by previous work.
  • ...and 8 more figures