Table of Contents
Fetching ...

CHOrD: Generation of Collision-Free, House-Scale, and Organized Digital Twins for 3D Indoor Scenes with Controllable Floor Plans and Optimal Layouts

Chong Su, Yingbin Fu, Zheyuan Hu, Jing Yang, Param Hanji, Shaojun Wang, Xuan Zhao, Cengiz Öztireli, Fangcheng Zhong

TL;DR

CHOrD tackles the problem of scalable, collision-free 3D indoor scene synthesis for house-scale digital twins by introducing a 2D image-based intermediate layout and a diffusion-based pipeline that generates coherent floor plans conditioned on multi-modal inputs. The approach combines automatic hierarchical scene graph extraction with multi-level object retrieval and photorealistic rendering, supported by a new CHOrD dataset with expanded item coverage and higher data quality. Key contributions include the diffusion-based layout generator, hierarchical scene graphs, text and open-plan conditioned floor planning, and a publicly released dataset that enables house-scale evaluation. The results demonstrate state-of-the-art performance on both 3D-FRONT and the CHOrD dataset, achieving near-elimination of collisions, diverse layouts, and photorealistic renderings suitable for design, robotics, and embodied AI applications.

Abstract

We introduce CHOrD, a novel framework for scalable synthesis of 3D indoor scenes, designed to create house-scale, collision-free, and hierarchically structured indoor digital twins. In contrast to existing methods that directly synthesize the scene layout as a scene graph or object list, CHOrD incorporates a 2D image-based intermediate layout representation, enabling effective prevention of collision artifacts by successfully capturing them as out-of-distribution (OOD) scenarios during generation. Furthermore, unlike existing methods, CHOrD is capable of generating scene layouts that adhere to complex floor plans with multi-modal controls, enabling the creation of coherent, house-wide layouts robust to both geometric and semantic variations in room structures. Additionally, we propose a novel dataset with expanded coverage of household items and room configurations, as well as significantly improved data quality. CHOrD demonstrates state-of-the-art performance on both the 3D-FRONT and our proposed datasets, delivering photorealistic, spatially coherent indoor scene synthesis adaptable to arbitrary floor plan variations.

CHOrD: Generation of Collision-Free, House-Scale, and Organized Digital Twins for 3D Indoor Scenes with Controllable Floor Plans and Optimal Layouts

TL;DR

CHOrD tackles the problem of scalable, collision-free 3D indoor scene synthesis for house-scale digital twins by introducing a 2D image-based intermediate layout and a diffusion-based pipeline that generates coherent floor plans conditioned on multi-modal inputs. The approach combines automatic hierarchical scene graph extraction with multi-level object retrieval and photorealistic rendering, supported by a new CHOrD dataset with expanded item coverage and higher data quality. Key contributions include the diffusion-based layout generator, hierarchical scene graphs, text and open-plan conditioned floor planning, and a publicly released dataset that enables house-scale evaluation. The results demonstrate state-of-the-art performance on both 3D-FRONT and the CHOrD dataset, achieving near-elimination of collisions, diverse layouts, and photorealistic renderings suitable for design, robotics, and embodied AI applications.

Abstract

We introduce CHOrD, a novel framework for scalable synthesis of 3D indoor scenes, designed to create house-scale, collision-free, and hierarchically structured indoor digital twins. In contrast to existing methods that directly synthesize the scene layout as a scene graph or object list, CHOrD incorporates a 2D image-based intermediate layout representation, enabling effective prevention of collision artifacts by successfully capturing them as out-of-distribution (OOD) scenarios during generation. Furthermore, unlike existing methods, CHOrD is capable of generating scene layouts that adhere to complex floor plans with multi-modal controls, enabling the creation of coherent, house-wide layouts robust to both geometric and semantic variations in room structures. Additionally, we propose a novel dataset with expanded coverage of household items and room configurations, as well as significantly improved data quality. CHOrD demonstrates state-of-the-art performance on both the 3D-FRONT and our proposed datasets, delivering photorealistic, spatially coherent indoor scene synthesis adaptable to arbitrary floor plan variations.

Paper Structure

This paper contains 33 sections, 2 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: I) CHOrD synthesizes realistic and well-structured digital twins for 3D indoor scenes. II) CHOrD can be conditioned on complex floor plan structures to generate realistic house-wide layouts while ensuring physically plausible, spatially coherent, and collision-free arrangements. It further introduces a hierarchical data structure that organizes objects not only at the room level but also at finer scales, such as desks and coffee tables. III, IV) CHOrD supports controllable floor plans via multimodal inputs, as well as photorealistic, 3D-consistent rendering. These capabilities equip CHOrD with considerable versatility, enabling a broad range of downstream applications.
  • Figure 2: Overview of CHOrD. First, we generate the scene layout using a conditional diffusion model, conditioned on a floor plan image. Next, we apply object detection to identify individual household items and use a structured scene graph to hierarchically organize the spatial relationships between rooms and objects, along with their attributes. Finally, the scene is rendered into photorealistic images.
  • Figure 3: Scene graph extraction and object retrieval.
  • Figure 4: Erroneous scenes in 3D-FRONT.
  • Figure 5: Visualization of synthesized layouts by CHOrD, DiffuScene tang2024diffuscene, InstructScene lin2024instructscene, and PhyScene yang2024physcene. All results were randomly selected from an arbitrary batch without any cherry-picking. It is evident that only CHOrD produces clean, collision-free layouts, whereas other methods exhibit significant artifacts such as implausible overlapping items, inconsistent orientations, or missing objects.
  • ...and 12 more figures