SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
Yandan Yang, Baoxiong Jia, Shujie Zhang, Siyuan Huang
TL;DR
SceneWeaver tackles the challenge of open-ended, physically plausible 3D indoor scene synthesis for embodied AI by introducing a reflective, agentic framework. It unifies diverse generation tools through a standardized interface and governs their use with a reason-act-reflect loop, powered by a physics-aware executor. The approach achieves state-of-the-art performance on both common and open-vocabulary room types, with zero collisions and boundary violations and strong instruction-following performance, as shown by quantitative metrics and human studies. Comprehensive ablations demonstrate the necessity of iterative reflection, tool diversity, and physics-based refinement for high-quality, configurable scene synthesis. The work advances towards general-purpose, controllable 3D environment generation and offers a scalable framework for integrating future scene-generation tools and assets.
Abstract
Indoor scene synthesis has become increasingly important with the rise of Embodied AI, which requires 3D environments that are not only visually realistic but also physically plausible and functionally diverse. While recent approaches have advanced visual fidelity, they often remain constrained to fixed scene categories, lack sufficient object-level detail and physical consistency, and struggle to align with complex user instructions. In this work, we present SceneWeaver, a reflective agentic framework that unifies diverse scene synthesis paradigms through tool-based iterative refinement. At its core, SceneWeaver employs a language model-based planner to select from a suite of extensible scene generation tools, ranging from data-driven generative models to visual- and LLM-based methods, guided by self-evaluation of physical plausibility, visual realism, and semantic alignment with user input. This closed-loop reason-act-reflect design enables the agent to identify semantic inconsistencies, invoke targeted tools, and update the environment over successive iterations. Extensive experiments on both common and open-vocabulary room types demonstrate that SceneWeaver not only outperforms prior methods on physical, visual, and semantic metrics, but also generalizes effectively to complex scenes with diverse instructions, marking a step toward general-purpose 3D environment generation. Project website: https://scene-weaver.github.io/.
