Table of Contents
Fetching ...

HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation

Hou In Derek Pun, Hou In Ivan Tam, Austin T. Wang, Xiaoliang Huo, Angel X. Chang, Manolis Savva

TL;DR

The paper tackles the challenge of generating indoor scenes with dense object arrangements, including small objects, by introducing Hierarchical Scene Motifs (HSM). HSM decomposes scenes into a hierarchy of support regions across scales, learning and composing scene motifs that encode recurring spatial patterns and relationships, and optimizes layouts via a combination of VLM guidance and a DFS-based solver. Through experiments on the Habitat Synthetic Scenes Dataset, HSM outperforms state-of-the-art baselines in fidelity to textual input and in plausibility of object placements, with ablations confirming the contribution of motifs, layout optimization, and search procedures. The approach advances realistic, controllable, multi-scale indoor scene generation and opens avenues for learned motif libraries and efficiency improvements. The work demonstrates the importance of hierarchical reasoning and motif-based composition for scalable, dense indoor scene synthesis conditioned on text descriptions.

Abstract

Despite advances in indoor 3D scene layout generation, synthesizing scenes with dense object arrangements remains challenging. Existing methods focus on large furniture while neglecting smaller objects, resulting in unrealistically empty scenes. Those that place small objects typically do not honor arrangement specifications, resulting in largely random placement not following the text description. We present Hierarchical Scene Motifs (HSM): a hierarchical framework for indoor scene generation with dense object arrangements across spatial scales. Indoor scenes are inherently hierarchical, with surfaces supporting objects at different scales, from large furniture on floors to smaller objects on tables and shelves. HSM embraces this hierarchy and exploits recurring cross-scale spatial patterns to generate complex and realistic scenes in a unified manner. Our experiments show that HSM outperforms existing methods by generating scenes that better conform to user input across room types and spatial configurations. Project website is available at https://3dlg-hcvc.github.io/hsm .

HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation

TL;DR

The paper tackles the challenge of generating indoor scenes with dense object arrangements, including small objects, by introducing Hierarchical Scene Motifs (HSM). HSM decomposes scenes into a hierarchy of support regions across scales, learning and composing scene motifs that encode recurring spatial patterns and relationships, and optimizes layouts via a combination of VLM guidance and a DFS-based solver. Through experiments on the Habitat Synthetic Scenes Dataset, HSM outperforms state-of-the-art baselines in fidelity to textual input and in plausibility of object placements, with ablations confirming the contribution of motifs, layout optimization, and search procedures. The approach advances realistic, controllable, multi-scale indoor scene generation and opens avenues for learned motif libraries and efficiency improvements. The work demonstrates the importance of hierarchical reasoning and motif-based composition for scalable, dense indoor scene synthesis conditioned on text descriptions.

Abstract

Despite advances in indoor 3D scene layout generation, synthesizing scenes with dense object arrangements remains challenging. Existing methods focus on large furniture while neglecting smaller objects, resulting in unrealistically empty scenes. Those that place small objects typically do not honor arrangement specifications, resulting in largely random placement not following the text description. We present Hierarchical Scene Motifs (HSM): a hierarchical framework for indoor scene generation with dense object arrangements across spatial scales. Indoor scenes are inherently hierarchical, with surfaces supporting objects at different scales, from large furniture on floors to smaller objects on tables and shelves. HSM embraces this hierarchy and exploits recurring cross-scale spatial patterns to generate complex and realistic scenes in a unified manner. Our experiments show that HSM outperforms existing methods by generating scenes that better conform to user input across room types and spatial configurations. Project website is available at https://3dlg-hcvc.github.io/hsm .

Paper Structure

This paper contains 39 sections, 1 equation, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: Overview. Given a room description and optional room boundary as input, HSM decomposes indoor scenes hierarchically and identifies valid support regions (highlighted in pink boxes) at each level of the hierarchy. The system then populates these regions by generating and optimizing object arrangements in a unified manner across scales, generating scenes with dense object arrangements.
  • Figure 2: HSM framework overview. Given an input text description and an optional room boundary, HSM decomposes the input into requirements at different scales, and generates the scene through a unified three-stage framework: 1) Extract support regions for object placements; 2) Generate appropriate scene motifs for each region; and 3) Optimize scene motif placements within each region. These steps are repeated across scales to generate a scene that aligns to the input with dense small object placements.
  • Figure 3: Scene motif generation process. An input description is first decomposed into a hierarchy of motifs. We then retrieve the corresponding 3D assets and generate the scene motifs iteratively, starting from the innermost motif (on_each_side) and expanding to the outermost motif of the hierarchy (in_front_of). The generated scene motif is visually validated with a VLM.
  • Figure 4: Support region extraction. Given a triangle mesh of the object, such as the example shelf unit, we first extract vertical and horizontal surfaces. We then compute the height clearance for each horizontal surface and use the vertical surfaces to segment them into compartments. The result is a set of support regions that can be populated with objects.
  • Figure 5: Qualitative comparisons at the scene level. Objects and spatial relationships in the input text are highlighted with colors and underlines, and spatial relationships are emphasized using boxes. HSM produces more coherent spatial arrangements and is better aligned to the input compared to existing approaches.
  • ...and 6 more figures