Table of Contents
Fetching ...

3D Building Generation in Minecraft via Large Language Models

Shiying Hu, Zengrong Huang, Chengpeng Hu, Jialin Liu

TL;DR

This work tackles 3D building generation in Minecraft using large language models by introducing Text to Building in Minecraft (T2BM), a pipeline with input refinement, an interlayer JSON encoding, and a repairing module, followed by decoding through GDPC. The method enables generating complete buildings with facades, interiors, and functional blocks from simple prompts, with experiments showing that prompt refinement and higher-capability models (GPT-4) yield higher completeness $C$ and satisfaction $S$, though gains exhibit diminishing returns. The key contributions include a concrete interlayer representation that encodes structural and functional blocks, a repair mechanism to fix common interlayer errors, and empirical evidence that refined prompts improve alignment with user requirements. The approach has practical implications for rapid, constraint-driven 3D content creation in sandbox environments and informs prompt engineering and repair-driven workflows for cross-domain procedural content generation, with future work on broader environments and deeper integration of repair into the prompting process.

Abstract

Recently, procedural content generation has exhibited considerable advancements in the domain of 2D game level generation such as Super Mario Bros. and Sokoban through large language models (LLMs). To further validate the capabilities of LLMs, this paper explores how LLMs contribute to the generation of 3D buildings in a sandbox game, Minecraft. We propose a Text to Building in Minecraft (T2BM) model, which involves refining prompts, decoding interlayer representation and repairing. Facade, indoor scene and functional blocks like doors are supported in the generation. Experiments are conducted to evaluate the completeness and satisfaction of buildings generated via LLMs. It shows that LLMs hold significant potential for 3D building generation. Given appropriate prompts, LLMs can generate correct buildings in Minecraft with complete structures and incorporate specific building blocks such as windows and beds, meeting the specified requirements of human users.

3D Building Generation in Minecraft via Large Language Models

TL;DR

This work tackles 3D building generation in Minecraft using large language models by introducing Text to Building in Minecraft (T2BM), a pipeline with input refinement, an interlayer JSON encoding, and a repairing module, followed by decoding through GDPC. The method enables generating complete buildings with facades, interiors, and functional blocks from simple prompts, with experiments showing that prompt refinement and higher-capability models (GPT-4) yield higher completeness and satisfaction , though gains exhibit diminishing returns. The key contributions include a concrete interlayer representation that encodes structural and functional blocks, a repair mechanism to fix common interlayer errors, and empirical evidence that refined prompts improve alignment with user requirements. The approach has practical implications for rapid, constraint-driven 3D content creation in sandbox environments and informs prompt engineering and repair-driven workflows for cross-domain procedural content generation, with future work on broader environments and deeper integration of repair into the prompting process.

Abstract

Recently, procedural content generation has exhibited considerable advancements in the domain of 2D game level generation such as Super Mario Bros. and Sokoban through large language models (LLMs). To further validate the capabilities of LLMs, this paper explores how LLMs contribute to the generation of 3D buildings in a sandbox game, Minecraft. We propose a Text to Building in Minecraft (T2BM) model, which involves refining prompts, decoding interlayer representation and repairing. Facade, indoor scene and functional blocks like doors are supported in the generation. Experiments are conducted to evaluate the completeness and satisfaction of buildings generated via LLMs. It shows that LLMs hold significant potential for 3D building generation. Given appropriate prompts, LLMs can generate correct buildings in Minecraft with complete structures and incorporate specific building blocks such as windows and beds, meeting the specified requirements of human users.
Paper Structure (25 sections, 15 figures, 1 table, 1 algorithm)

This paper contains 25 sections, 15 figures, 1 table, 1 algorithm.

Figures (15)

  • Figure 1: Workflow of Text to Building in Minecraft (T2BM) model.
  • Figure 1: Three types of connecting structures.
  • Figure 2: Buildings in the case of C$\wedge$S generated without and with refining given the user input "A wooden house with windows".
  • Figure 2: Buildings generated by GPT-3.5 with raw prompts.
  • Figure 3: Buildings generated by GPT-3.5 with refined prompts.
  • ...and 10 more figures