RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture
Liangchen Song, Liangliang Cao, Hongyu Xu, Kai Kang, Feng Tang, Junsong Yuan, Yang Zhao
TL;DR
This work tackles text-driven editing of real indoor 3D scenes by refining both geometry and texture of a scanned mesh. It introduces Geometry Guided Diffusion to produce a coherent cubemap texture conditioned on text and depth priors, followed by Mesh Optimization that jointly updates texture and geometry via differentiable rendering and pseudo-depth supervision. A distance-map blending strategy ensures cross-face consistency, and extensive experiments on ARKitScenes demonstrate improved texture quality, geometry smoothness, and global style coherence compared with NeRF-based and image-guided baselines. The approach enables robust, style-controlled editing of real-world interiors with practical applicability on consumer-scanned data.
Abstract
The techniques for 3D indoor scene capturing are widely used, but the meshes produced leave much to be desired. In this paper, we propose "RoomDreamer", which leverages powerful natural language to synthesize a new room with a different style. Unlike existing image synthesis methods, our work addresses the challenge of synthesizing both geometry and texture aligned to the input scene structure and prompt simultaneously. The key insight is that a scene should be treated as a whole, taking into account both scene texture and geometry. The proposed framework consists of two significant components: Geometry Guided Diffusion and Mesh Optimization. Geometry Guided Diffusion for 3D Scene guarantees the consistency of the scene style by applying the 2D prior to the entire scene simultaneously. Mesh Optimization improves the geometry and texture jointly and eliminates the artifacts in the scanned scene. To validate the proposed method, real indoor scenes scanned with smartphones are used for extensive experiments, through which the effectiveness of our method is demonstrated.
