Table of Contents
Fetching ...

RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation

Wenzhuo Sun, Mingjian Liang, Wenxuan Song, Xuelian Cheng, Zongyuan Ge

TL;DR

RoomPlanner addresses the challenge of automatic, text-driven 3D indoor scene generation by integrating hierarchical LLM-based reasoning and grounding with explicit layout constraints. It couples a layout-aware planning stage (collision and reachability) with differentiable scene optimization leveraging 3D Gaussian representations and diffusion priors, augmented by the AnyReach camera trajectory and Interval Timestep Flow Sampling to deliver high-quality, editable scenes in under 30 minutes. Key contributions include a fully automated, end-to-end pipeline, explicit spatial constraints for plausible layouts, and a single-pass optimization that yields physically coherent, configurable interiors with improved rendering speed and visual fidelity. The framework demonstrates superior qualitative and quantitative performance against prior methods and supports broad editability, making it practical for design, embodied AI, and virtual production workflows.

Abstract

In this paper, we propose RoomPlanner, the first fully automatic 3D room generation framework for painlessly creating realistic indoor scenes with only short text as input. Without any manual layout design or panoramic image guidance, our framework can generate explicit layout criteria for rational spatial placement. We begin by introducing a hierarchical structure of language-driven agent planners that can automatically parse short and ambiguous prompts into detailed scene descriptions. These descriptions include raw spatial and semantic attributes for each object and the background, which are then used to initialize 3D point clouds. To position objects within bounded environments, we implement two arrangement constraints that iteratively optimize spatial arrangements, ensuring a collision-free and accessible layout solution. In the final rendering stage, we propose a novel AnyReach Sampling strategy for camera trajectory, along with the Interval Timestep Flow Sampling (ITFS) strategy, to efficiently optimize the coarse 3D Gaussian scene representation. These approaches help reduce the total generation time to under 30 minutes. Extensive experiments demonstrate that our method can produce geometrically rational 3D indoor scenes, surpassing prior approaches in both rendering speed and visual quality while preserving editability. The code will be available soon.

RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation

TL;DR

RoomPlanner addresses the challenge of automatic, text-driven 3D indoor scene generation by integrating hierarchical LLM-based reasoning and grounding with explicit layout constraints. It couples a layout-aware planning stage (collision and reachability) with differentiable scene optimization leveraging 3D Gaussian representations and diffusion priors, augmented by the AnyReach camera trajectory and Interval Timestep Flow Sampling to deliver high-quality, editable scenes in under 30 minutes. Key contributions include a fully automated, end-to-end pipeline, explicit spatial constraints for plausible layouts, and a single-pass optimization that yields physically coherent, configurable interiors with improved rendering speed and visual fidelity. The framework demonstrates superior qualitative and quantitative performance against prior methods and supports broad editability, making it practical for design, embodied AI, and virtual production workflows.

Abstract

In this paper, we propose RoomPlanner, the first fully automatic 3D room generation framework for painlessly creating realistic indoor scenes with only short text as input. Without any manual layout design or panoramic image guidance, our framework can generate explicit layout criteria for rational spatial placement. We begin by introducing a hierarchical structure of language-driven agent planners that can automatically parse short and ambiguous prompts into detailed scene descriptions. These descriptions include raw spatial and semantic attributes for each object and the background, which are then used to initialize 3D point clouds. To position objects within bounded environments, we implement two arrangement constraints that iteratively optimize spatial arrangements, ensuring a collision-free and accessible layout solution. In the final rendering stage, we propose a novel AnyReach Sampling strategy for camera trajectory, along with the Interval Timestep Flow Sampling (ITFS) strategy, to efficiently optimize the coarse 3D Gaussian scene representation. These approaches help reduce the total generation time to under 30 minutes. Extensive experiments demonstrate that our method can produce geometrically rational 3D indoor scenes, surpassing prior approaches in both rendering speed and visual quality while preserving editability. The code will be available soon.

Paper Structure

This paper contains 38 sections, 6 equations, 15 figures, 2 tables, 1 algorithm.

Figures (15)

  • Figure 1: Compared to previous methods, ie., visual-guided method Pano2Room pu2024pano2room and rule-based method DreamScene dreamscene, our approach effectively generates indoor scenes characterized by (a) more realism, (b) smoother mesh structures, and (c) support for a diverse array of editions, including operations such as rotation, translation, importing/deleting 3D assets, and style variations.
  • Figure 2: RoomPlanner follows a 'Reasoning-Grounding, Arrangement, and Optimization' pipeline, decomposing the complex 3D scene generation task into scalable subtasks.
  • Figure 3: We compare the interactive scene 3D generation with Set-the-Scene setthescene, GALA3D GALA3D and DreamScene dreamscene.
  • Figure 4: Effectiveness of ITFS. The coach labels in yellow boxes guide the orientation towards a more rational perspective. As the timesteps progress from $m = 0$ to $m = 3$, the scene quality is optimized from a coarse representation to a fine depiction, enriched with more semantic details.
  • Figure 5: Prompt templates for LLM reasoning to generate the floor and wall height module.
  • ...and 10 more figures