Table of Contents
Fetching ...

HouseTune: Two-Stage Floorplan Generation with LLM Assistance

Ziyang Zong, Guanying Chen, Zhaohuan Zhan, Fengcheng Yu, Guang Tan

TL;DR

HouseTune addresses natural-language floorplan generation by decoupling reasoning and geometric refinement into two stages: an LLM uses Chain-of-Thought prompting to generate an initial Layout-Init, which is then refined by a dual-conditioned diffusion model to Layout-Final. The forward and reverse diffusion processes are conditioned on the Layout-Init to enforce geometric and spatial constraints, and a Transformer-based architecture handles the denoising with both continuous and discrete coordinate representations. On the RPlan dataset, HouseTune achieves state-of-the-art performance, notably boosting diversity by about 28% and improving compatibility by about 79% over prior diffusion-based methods, while remaining robust to different LLMs and prompting strategies. The approach reduces dependence on extensive domain-specific training data and broadens accessibility to non-expert users, with potential to extend to other architectural design tasks in the future.

Abstract

This paper proposes a two-stage text-to-floorplan generation framework that combines the reasoning capability of Large Language Models (LLMs) with the generative power of diffusion models. In the first stage, we leverage a Chain-of-Thought (CoT) prompting strategy to guide an LLM in generating an initial layout (Layout-Init) from natural language descriptions, which ensures a user-friendly and intuitive design process. However, Layout-Init may lack precise geometric alignment and fine-grained structural details. To address this, the second stage employs a conditional diffusion model to refine Layout-Init into a final floorplan (Layout-Final) that better adheres to physical constraints and user requirements. Unlike prior methods, our approach effectively reduces the difficulty of floorplan generation learning without the need for extensive domain-specific training data. Experimental results demonstrate that our approach achieves state-of-the-art performance across all metrics, which validates its effectiveness in practical home design applications.

HouseTune: Two-Stage Floorplan Generation with LLM Assistance

TL;DR

HouseTune addresses natural-language floorplan generation by decoupling reasoning and geometric refinement into two stages: an LLM uses Chain-of-Thought prompting to generate an initial Layout-Init, which is then refined by a dual-conditioned diffusion model to Layout-Final. The forward and reverse diffusion processes are conditioned on the Layout-Init to enforce geometric and spatial constraints, and a Transformer-based architecture handles the denoising with both continuous and discrete coordinate representations. On the RPlan dataset, HouseTune achieves state-of-the-art performance, notably boosting diversity by about 28% and improving compatibility by about 79% over prior diffusion-based methods, while remaining robust to different LLMs and prompting strategies. The approach reduces dependence on extensive domain-specific training data and broadens accessibility to non-expert users, with potential to extend to other architectural design tasks in the future.

Abstract

This paper proposes a two-stage text-to-floorplan generation framework that combines the reasoning capability of Large Language Models (LLMs) with the generative power of diffusion models. In the first stage, we leverage a Chain-of-Thought (CoT) prompting strategy to guide an LLM in generating an initial layout (Layout-Init) from natural language descriptions, which ensures a user-friendly and intuitive design process. However, Layout-Init may lack precise geometric alignment and fine-grained structural details. To address this, the second stage employs a conditional diffusion model to refine Layout-Init into a final floorplan (Layout-Final) that better adheres to physical constraints and user requirements. Unlike prior methods, our approach effectively reduces the difficulty of floorplan generation learning without the need for extensive domain-specific training data. Experimental results demonstrate that our approach achieves state-of-the-art performance across all metrics, which validates its effectiveness in practical home design applications.

Paper Structure

This paper contains 13 sections, 11 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Comparison of different floorplan generation pipelines. (a) Graph-to-floorplan approach (e.g., HouseDiffusion), where rooms are represented as nodes and doors as edges, forming a graph zhuo2022efficientzhuo2022proximityzhuo2024partitioning that represents spatial relationships; (b) Text-to-floorplan approach, which directly maps a natural language description to a house layout; (c) Our two-stage pipeline, where an LLM is used to generate an initial layout, Layout-Init, according to the user's textual specification. The initial solution serves as a condition for generating the final layout, Layou-Final, through a diffusion model.
  • Figure 2: Training and testing processes of our method. (a) An example of LLM generating a Layout-Init according to user demands. (b) Given a house layout sample, we use the LLM to describe it. The description is used to mimic the user's demands. Using multiple examples as in (a) as demos, we ask the LLM to generate a Layout-Init for each sample. These initial layouts serve as conditions for the generator, which outputs Layouts-Final. (c) Given a textual description from the user, we again use the demos like (a) to obtain a Layout-Init, which goes through the diffusion model to generate Layout-Final.
  • Figure 3: CoT-based prompting facilitates generation of Layout-Init and its function in training and testing. (a) An example showing how to interact with the LLM to obtain Layout-Init. The Initialization section defines the LLM's role as a house designer, and standardizes output format; the Chain of Thought section directs the LLM to create a house layout step by step, ensuring plausible room placement and sizing. (b) Invoking the Layout-Init generation during training and testing.
  • Figure 4: Conditional diffusion network for refining Layout-Init. The forward process takes the ground-truth house layout $x^g_0$ and the Layout-Init $x^i_0$ and adds a Gaussian noise to create a noisy house layout sample $x_T$. The reverse process takes a noisy house layout at time $t$ with Layout-Init as the condition. Two encoders are used to encode and obtain the latent representations for $x^g_t$ and $x^i_t$.
  • Figure 5: Generation samples from Tell2Design, HouseDiffusion and HouseTune. The results of HouseTune align well with user requirements in terms of room count and type, with plausible room arrangement. The results produced by Tell2Design exhibit high similarity to the Reference layouts, demonstrating limited diversity. However, the number of generated houses deviates from the Reference, indicating inconsistencies in quantity. HouseDiffusion performs reasonably well; yet, in some cases, it generates gaps or holes within the house (indicated by the dashed-rectangles), along with misaligned object placement, which makes the layout unrealistic.
  • ...and 2 more figures