SceneCraft: Layout-Guided 3D Scene Generation
Xiuyu Yang, Yunze Man, Jun-Kun Chen, Yu-Xiong Wang
TL;DR
SceneCraft addresses the challenge of text-and-layout-guided 3D indoor scene generation by introducing a rendering-based pipeline that converts 3D semantic layouts into multi-view 2D proxy maps and learns a NeRF-based final scene representation. The approach comprises a user-friendly Bounding-Box Scene (BBS) layout interface, a 2D diffusion model SceneCraft2D conditioned on bounding-box images, and a distillation-based 3D synthesis stage with annealing, a layout-aware depth constraint, and texture consolidation. It supports free camera trajectories and complex multi-room layouts beyond single rooms, achieving state-of-the-art 3D indoor scene generation on datasets like ScanNet++ and Hypersim. The results show improved geometric consistency, diverse textures, and realistic visual quality, with code and results available from the authors.
Abstract
The creation of complex 3D scenes tailored to user specifications has been a tedious and challenging task with traditional 3D modeling tools. Although some pioneering methods have achieved automatic text-to-3D generation, they are generally limited to small-scale scenes with restricted control over the shape and texture. We introduce SceneCraft, a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences provided by users. Central to our method is a rendering-based technique, which converts 3D semantic layouts into multi-view 2D proxy maps. Furthermore, we design a semantic and depth conditioned diffusion model to generate multi-view images, which are used to learn a neural radiance field (NeRF) as the final scene representation. Without the constraints of panorama image generation, we surpass previous methods in supporting complicated indoor space generation beyond a single room, even as complicated as a whole multi-bedroom apartment with irregular shapes and layouts. Through experimental analysis, we demonstrate that our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality. Code and more results are available at: https://orangesodahub.github.io/SceneCraft
