Table of Contents
Fetching ...

LLM4DESIGN: An Automated Multi-Modal System for Architectural and Environmental Design

Ran Chen, Xueqi Yao, Xuhui Jiang

TL;DR

LLM4DESIGN tackles the problem of generating architectural and environmental designs that are simultaneously imaginative and executable. It combines a multi-agent system to stimulate creative thinking, Retrieval Augmented Generation to ground ideas in existing cases, and Visual Language Models to ensure consistent narratives and visuals, supported by a new cross-modal design dataset. The methodology includes a design-language Aligner, a debate between innovation and retrieval agents, a conclusing agent to synthesize narratives, and a Visual Agent for rendering, all evaluated against expert-designed criteria and baselines such as SD and DALLE3. Main results show robust design scores, strong multimodal alignment, and evidence that the proposed architecture can outperform baselines in practical design tasks, with ablations highlighting the importance of each component. The work advances automated architectural design by delivering a scalable, cross-modal pipeline and a resource dataset to foster future research.

Abstract

This study introduces LLM4DESIGN, a highly automated system for generating architectural and environmental design proposals. LLM4DESIGN, relying solely on site conditions and design requirements, employs Multi-Agent systems to foster creativity, Retrieval Augmented Generation (RAG) to ground designs in realism, and Visual Language Models (VLM) to synchronize all information. This system resulting in coherent, multi-illustrated, and multi-textual design schemes. The system meets the dual needs of narrative storytelling and objective drawing presentation in generating architectural and environmental design proposals. Extensive comparative and ablation experiments confirm the innovativeness of LLM4DESIGN's narrative and the grounded applicability of its plans, demonstrating its superior performance in the field of urban renewal design. Lastly, we have created the first cross-modal design scheme dataset covering architecture, landscape, interior, and urban design, providing rich resources for future research.

LLM4DESIGN: An Automated Multi-Modal System for Architectural and Environmental Design

TL;DR

LLM4DESIGN tackles the problem of generating architectural and environmental designs that are simultaneously imaginative and executable. It combines a multi-agent system to stimulate creative thinking, Retrieval Augmented Generation to ground ideas in existing cases, and Visual Language Models to ensure consistent narratives and visuals, supported by a new cross-modal design dataset. The methodology includes a design-language Aligner, a debate between innovation and retrieval agents, a conclusing agent to synthesize narratives, and a Visual Agent for rendering, all evaluated against expert-designed criteria and baselines such as SD and DALLE3. Main results show robust design scores, strong multimodal alignment, and evidence that the proposed architecture can outperform baselines in practical design tasks, with ablations highlighting the importance of each component. The work advances automated architectural design by delivering a scalable, cross-modal pipeline and a resource dataset to foster future research.

Abstract

This study introduces LLM4DESIGN, a highly automated system for generating architectural and environmental design proposals. LLM4DESIGN, relying solely on site conditions and design requirements, employs Multi-Agent systems to foster creativity, Retrieval Augmented Generation (RAG) to ground designs in realism, and Visual Language Models (VLM) to synchronize all information. This system resulting in coherent, multi-illustrated, and multi-textual design schemes. The system meets the dual needs of narrative storytelling and objective drawing presentation in generating architectural and environmental design proposals. Extensive comparative and ablation experiments confirm the innovativeness of LLM4DESIGN's narrative and the grounded applicability of its plans, demonstrating its superior performance in the field of urban renewal design. Lastly, we have created the first cross-modal design scheme dataset covering architecture, landscape, interior, and urban design, providing rich resources for future research.
Paper Structure (27 sections, 1 equation, 7 figures, 4 tables)

This paper contains 27 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Main Workflow
  • Figure 2: A case of design language
  • Figure 3: Details for agent framework
  • Figure 4: Agent_V
  • Figure 5: Case Study 1
  • ...and 2 more figures