HSM: Hierarchical Scene Motifs for Multi-Scale Indoor Scene Generation
Hou In Derek Pun, Hou In Ivan Tam, Austin T. Wang, Xiaoliang Huo, Angel X. Chang, Manolis Savva
TL;DR
The paper tackles the challenge of generating indoor scenes with dense object arrangements, including small objects, by introducing Hierarchical Scene Motifs (HSM). HSM decomposes scenes into a hierarchy of support regions across scales, learning and composing scene motifs that encode recurring spatial patterns and relationships, and optimizes layouts via a combination of VLM guidance and a DFS-based solver. Through experiments on the Habitat Synthetic Scenes Dataset, HSM outperforms state-of-the-art baselines in fidelity to textual input and in plausibility of object placements, with ablations confirming the contribution of motifs, layout optimization, and search procedures. The approach advances realistic, controllable, multi-scale indoor scene generation and opens avenues for learned motif libraries and efficiency improvements. The work demonstrates the importance of hierarchical reasoning and motif-based composition for scalable, dense indoor scene synthesis conditioned on text descriptions.
Abstract
Despite advances in indoor 3D scene layout generation, synthesizing scenes with dense object arrangements remains challenging. Existing methods focus on large furniture while neglecting smaller objects, resulting in unrealistically empty scenes. Those that place small objects typically do not honor arrangement specifications, resulting in largely random placement not following the text description. We present Hierarchical Scene Motifs (HSM): a hierarchical framework for indoor scene generation with dense object arrangements across spatial scales. Indoor scenes are inherently hierarchical, with surfaces supporting objects at different scales, from large furniture on floors to smaller objects on tables and shelves. HSM embraces this hierarchy and exploits recurring cross-scale spatial patterns to generate complex and realistic scenes in a unified manner. Our experiments show that HSM outperforms existing methods by generating scenes that better conform to user input across room types and spatial configurations. Project website is available at https://3dlg-hcvc.github.io/hsm .
