Table of Contents
Fetching ...

DeBaRA: Denoising-Based 3D Room Arrangement Generation

Léopold Maillard, Nicolas Sereyjol-Garros, Tom Durand, Maks Ovsjanikov

TL;DR

DeBaRA is introduced, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment and is evaluated through extensive experiments to demonstrate significant improvement upon state-of-the-art approaches in a range of scenarios.

Abstract

Generating realistic and diverse layouts of furnished indoor 3D scenes unlocks multiple interactive applications impacting a wide range of industries. The inherent complexity of object interactions, the limited amount of available data and the requirement to fulfill spatial constraints all make generative modeling for 3D scene synthesis and arrangement challenging. Current methods address these challenges autoregressively or by using off-the-shelf diffusion objectives by simultaneously predicting all attributes without 3D reasoning considerations. In this paper, we introduce DeBaRA, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment. We argue that the most critical component of a scene synthesis system is to accurately establish the size and position of various objects within a restricted area. Based on this insight, we propose a lightweight conditional score-based model designed with 3D spatial awareness at its core. We demonstrate that by focusing on spatial attributes of objects, a single trained DeBaRA model can be leveraged at test time to perform several downstream applications such as scene synthesis, completion and re-arrangement. Further, we introduce a novel Self Score Evaluation procedure so it can be optimally employed alongside external LLM models. We evaluate our approach through extensive experiments and demonstrate significant improvement upon state-of-the-art approaches in a range of scenarios.

DeBaRA: Denoising-Based 3D Room Arrangement Generation

TL;DR

DeBaRA is introduced, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment and is evaluated through extensive experiments to demonstrate significant improvement upon state-of-the-art approaches in a range of scenarios.

Abstract

Generating realistic and diverse layouts of furnished indoor 3D scenes unlocks multiple interactive applications impacting a wide range of industries. The inherent complexity of object interactions, the limited amount of available data and the requirement to fulfill spatial constraints all make generative modeling for 3D scene synthesis and arrangement challenging. Current methods address these challenges autoregressively or by using off-the-shelf diffusion objectives by simultaneously predicting all attributes without 3D reasoning considerations. In this paper, we introduce DeBaRA, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment. We argue that the most critical component of a scene synthesis system is to accurately establish the size and position of various objects within a restricted area. Based on this insight, we propose a lightweight conditional score-based model designed with 3D spatial awareness at its core. We demonstrate that by focusing on spatial attributes of objects, a single trained DeBaRA model can be leveraged at test time to perform several downstream applications such as scene synthesis, completion and re-arrangement. Further, we introduce a novel Self Score Evaluation procedure so it can be optimally employed alongside external LLM models. We evaluate our approach through extensive experiments and demonstrate significant improvement upon state-of-the-art approaches in a range of scenarios.
Paper Structure (65 sections, 11 equations, 16 figures, 9 tables, 2 algorithms)

This paper contains 65 sections, 11 equations, 16 figures, 9 tables, 2 algorithms.

Figures (16)

  • Figure 1: Application scenarios overview. Besides generating diverse and realistic 3D indoor layouts, a single trained DeBaRA model can be employed to execute several related tasks by tweaking the initial sampling noise level $\sigma_{\text{max}}$ and/or performing object or attribute-level layout inpainting. Our novel SSE procedure enables 3D Scene Synthesis capabilities by efficiently selecting conditioning semantics from external sources using density estimates provided by the pretrained model.
  • Figure 2: DeBaRA architecture and training overview. At each iteration, 3D bounding boxes parameters $(\boldsymbol{p}, \boldsymbol{r}, \boldsymbol{d})$ of indoor scene's objects $\mathcal{O}$ are perturbed with Gaussian noise $\sigma \boldsymbol{\epsilon}$. The floor plan $\mathcal{F}$, noise level $\sigma$ and resulting objects $\mathcal{O_\sigma}$ are processed by respective encoders to form an unordered set of representations $\mathcal{T}$ fed as input to a transformer encoder. Novel object embeddings $\hat{\mathcal{T}}_{o}$ are finally decoded back to their predicted clean spatial configuration $(\hat{\boldsymbol{p}}, \hat{\boldsymbol{r}}, \hat{\boldsymbol{d}})$. Trainable modules are optimized by minimizing a semantic-aware Chamfer loss. Input object categories $\boldsymbol{c}$ are randomly dropped to model both the class-conditional and unconditional 3D layout distributions.
  • Figure 3: We compare our method with established baselines for generating a 3D layout from a floor plan and set of object categories. DeBaRA produces less failure cases while consistently generating regular arrangements within the room's bounds.
  • Figure 4: Qualitative results on scene re-arrangement (left) and completion (right). DeBaRA is able to recover a plausible layout from a messy one, and to finely take into account initial configurations.
  • Figure 5: Top-down views of scenes generated by DeBaRA from several conditioning candidates provided by a LLM and their associated SSE values. We qualitatively observe that lower scores (green) corresponds to more natural layouts while higher scores (red) can be filtered out.
  • ...and 11 more figures