HiGS: Hierarchical Generative Scene Framework for Multi-Step Associative Semantic Spatial Composition
Jiacheng Hong, Kunzhen Wu, Mingrui Yu, Yichao Gu, Shengze Xue, Shuangjiu Xiao, Deli Dong
TL;DR
HiGS introduces a multi-step, hierarchical approach to 3D scene generation that progresses from global layouts to local details via a Progressive Hierarchical Spatial-Semantic Graph (PHiSSG). By integrating LLM-guided image generation, scene understanding with object-level reconstruction, and relation reasoning, HiGS enables interactive region-focused refinement while automatically completing non-focused areas. Key contributions include the PHiSSG framework, a recursive layout optimization mechanism with stability corrections, and a progressive evolution strategy that merges local refinements back into a coherent global scene. Empirical results show HiGS outperforms single-shot methods in layout plausibility, style consistency, and semantic alignment with prompts, while reducing initial input requirements and enabling scalable, controllable 3D scene construction.
Abstract
Three-dimensional scene generation holds significant potential in gaming, film, and virtual reality. However, most existing methods adopt a single-step generation process, making it difficult to balance scene complexity with minimal user input. Inspired by the human cognitive process in scene modeling, which progresses from global to local, focuses on key elements, and completes the scene through semantic association, we propose HiGS, a hierarchical generative framework for multi-step associative semantic spatial composition. HiGS enables users to iteratively expand scenes by selecting key semantic objects, offering fine-grained control over regions of interest while the model completes peripheral areas automatically. To support structured and coherent generation, we introduce the Progressive Hierarchical Spatial-Semantic Graph (PHiSSG), which dynamically organizes spatial relationships and semantic dependencies across the evolving scene structure. PHiSSG ensures spatial and geometric consistency throughout the generation process by maintaining a one-to-one mapping between graph nodes and generated objects and supporting recursive layout optimization. Experiments demonstrate that HiGS outperforms single-stage methods in layout plausibility, style consistency, and user preference, offering a controllable and extensible paradigm for efficient 3D scene construction.
