Table of Contents
Fetching ...

LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model

Yixuan Yang, Junru Lu, Zixiang Zhao, Zhen Luo, James J. Q. Yu, Victor Sanchez, Feng Zheng

TL;DR

This work introduces LLplace, a dialogue-enabled framework for 3D indoor layout generation and editing that does not rely on spatial priors or in-context exemplars. It fine-tunes Llama3 with LoRA and uses a JSON-based input/output scheme plus meta prompts to guide object placement, while a two-turn dialogue dataset from 3D-Front enables dynamic editing. The approach demonstrates improved layout quality and robust editing over strong LLM-based baselines, as evidenced by FID, OOR, and GPT-4o evaluations. By enabling interactive, priors-free design, LLplace offers practical benefits for flexible interior layout design and space planning.

Abstract

Designing 3D indoor layouts is a crucial task with significant applications in virtual reality, interior design, and automated space planning. Existing methods for 3D layout design either rely on diffusion models, which utilize spatial relationship priors, or heavily leverage the inferential capabilities of proprietary Large Language Models (LLMs), which require extensive prompt engineering and in-context exemplars via black-box trials. These methods often face limitations in generalization and dynamic scene editing. In this paper, we introduce LLplace, a novel 3D indoor scene layout designer based on lightweight fine-tuned open-source LLM Llama3. LLplace circumvents the need for spatial relationship priors and in-context exemplars, enabling efficient and credible room layout generation based solely on user inputs specifying the room type and desired objects. We curated a new dialogue dataset based on the 3D-Front dataset, expanding the original data volume and incorporating dialogue data for adding and removing objects. This dataset can enhance the LLM's spatial understanding. Furthermore, through dialogue, LLplace activates the LLM's capability to understand 3D layouts and perform dynamic scene editing, enabling the addition and removal of objects. Our approach demonstrates that LLplace can effectively generate and edit 3D indoor layouts interactively and outperform existing methods in delivering high-quality 3D design solutions. Code and dataset will be released.

LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model

TL;DR

This work introduces LLplace, a dialogue-enabled framework for 3D indoor layout generation and editing that does not rely on spatial priors or in-context exemplars. It fine-tunes Llama3 with LoRA and uses a JSON-based input/output scheme plus meta prompts to guide object placement, while a two-turn dialogue dataset from 3D-Front enables dynamic editing. The approach demonstrates improved layout quality and robust editing over strong LLM-based baselines, as evidenced by FID, OOR, and GPT-4o evaluations. By enabling interactive, priors-free design, LLplace offers practical benefits for flexible interior layout design and space planning.

Abstract

Designing 3D indoor layouts is a crucial task with significant applications in virtual reality, interior design, and automated space planning. Existing methods for 3D layout design either rely on diffusion models, which utilize spatial relationship priors, or heavily leverage the inferential capabilities of proprietary Large Language Models (LLMs), which require extensive prompt engineering and in-context exemplars via black-box trials. These methods often face limitations in generalization and dynamic scene editing. In this paper, we introduce LLplace, a novel 3D indoor scene layout designer based on lightweight fine-tuned open-source LLM Llama3. LLplace circumvents the need for spatial relationship priors and in-context exemplars, enabling efficient and credible room layout generation based solely on user inputs specifying the room type and desired objects. We curated a new dialogue dataset based on the 3D-Front dataset, expanding the original data volume and incorporating dialogue data for adding and removing objects. This dataset can enhance the LLM's spatial understanding. Furthermore, through dialogue, LLplace activates the LLM's capability to understand 3D layouts and perform dynamic scene editing, enabling the addition and removal of objects. Our approach demonstrates that LLplace can effectively generate and edit 3D indoor layouts interactively and outperform existing methods in delivering high-quality 3D design solutions. Code and dataset will be released.
Paper Structure (19 sections, 4 equations, 6 figures, 7 tables)

This paper contains 19 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Generated results of 3D indoor scenes from LLplace and compared with LayoutGPT and GPT-4o. And editing results from LLplace and compared with GPT-4o.
  • Figure 2: The pipeline of LLplace. First, start from the left upper corner, we extract the room type and the user's desired objects from the user input. Then, we retrieve 3D objects and corresponding bounding boxes. Next, we wrap the user input, bounding box information, and meta prompt $\bm{P}_{gen}$ into an LLM instruction, as shown in the middle upper box. Using the LoRA fine-tuned Llama3 model, we obtain the LLM output (the right upper box), which includes the center coordinates and rotation angles of the objects. We then combine this output with the input information to convert it into a 3D layout and render it into a 3D scene (the left bottom corner). To edit the generated 3D scene layout, we combine the previous layout, user input, and edit prompt $\bm{P}_{edit}$into a new instruction. The fine-tuned Llama3 model is then applied to generate the new scene, illustrated at the right bottom.
  • Figure 3: The showcases of LLplace in generating the layout of the bedroom and the living room.
  • Figure 4: The qualitative results of the LLplace in scene editing.
  • Figure 5: The more showcases of LLplace in generating the 3D scene layouts.
  • ...and 1 more figures