DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis
Jialin Gao, Donghao Zhou, Mingjian Liang, Lihao Liu, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng
TL;DR
DisCo-Layout addresses generalization challenges in 3D indoor layout synthesis by separately refining semantic and physical aspects via two dedicated tools and coordinating them with a three-agent, VLM-based framework (planner, designer, evaluator). The planner derives placement rules and groups assets, the designer proposes initial poses, and the evaluator uses VQA to decide when refinements are needed, with SRT correcting high-level relationships and PRT resolving spatial collisions through a grid-matching algorithm. Empirical results show state-of-the-art performance with zero physical violations and high semantic coherence across diverse scenes and open-domain assets, outperforming baselines such as LayoutGPT, Holodeck, and LayoutVLM. The work highlights the value of modular, tool-based refinement and open-domain generalization for scalable, realistic 3D indoor layout synthesis, with code to be released for community use.
Abstract
3D indoor layout synthesis is crucial for creating virtual environments. Traditional methods struggle with generalization due to fixed datasets. While recent LLM and VLM-based approaches offer improved semantic richness, they often lack robust and flexible refinement, resulting in suboptimal layouts. We develop DisCo-Layout, a novel framework that disentangles and coordinates physical and semantic refinement. For independent refinement, our Semantic Refinement Tool (SRT) corrects abstract object relationships, while the Physical Refinement Tool (PRT) resolves concrete spatial issues via a grid-matching algorithm. For collaborative refinement, a multi-agent framework intelligently orchestrates these tools, featuring a planner for placement rules, a designer for initial layouts, and an evaluator for assessment. Experiments demonstrate DisCo-Layout's state-of-the-art performance, generating realistic, coherent, and generalizable 3D indoor layouts. Our code will be publicly available.
