Table of Contents
Fetching ...

DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft

Sam Earle, Filippos Kokkinos, Yuhe Nie, Julian Togelius, Roberta Raileanu

TL;DR

DreamCraft addresses the challenge of generating functional 3D game environments from natural language prompts by learning a quantized NeRF that outputs discrete Minecraft block layouts aligned with descriptions. It combines text-guided neural rendering with differentiable functional constraints, enabling distributional and adjacency control over block types. Compared to post-processed baselines, DreamCraft achieves higher in-game fidelity and prompt alignment, particularly on domain-specific prompts, while maintaining expressive control via NeRF-like representations. The approach demonstrates a first step toward democratizing flexible yet functional content creation for game design and RL environment generation, though it currently requires hours per structure and faces opportunities for speedups and enhanced lighting modeling.

Abstract

Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, such approaches cannot guarantee functionality, which is crucial for certain applications like game design. In this paper, we present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft. Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions. We find that DreamCraft produces more aligned in-game artifacts than a baseline that post-processes the output of an unconstrained NeRF. Thanks to the quantized representation of the environment, functional constraints can be integrated using specialized loss terms. We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types. DreamCraft inherits a high degree of expressivity and controllability from the NeRF, while still being able to incorporate functional constraints through domain-specific objectives.

DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft

TL;DR

DreamCraft addresses the challenge of generating functional 3D game environments from natural language prompts by learning a quantized NeRF that outputs discrete Minecraft block layouts aligned with descriptions. It combines text-guided neural rendering with differentiable functional constraints, enabling distributional and adjacency control over block types. Compared to post-processed baselines, DreamCraft achieves higher in-game fidelity and prompt alignment, particularly on domain-specific prompts, while maintaining expressive control via NeRF-like representations. The approach demonstrates a first step toward democratizing flexible yet functional content creation for game design and RL environment generation, though it currently requires hours per structure and faces opportunities for speedups and enhanced lighting modeling.

Abstract

Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, such approaches cannot guarantee functionality, which is crucial for certain applications like game design. In this paper, we present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft. Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions. We find that DreamCraft produces more aligned in-game artifacts than a baseline that post-processes the output of an unconstrained NeRF. Thanks to the quantized representation of the environment, functional constraints can be integrated using specialized loss terms. We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types. DreamCraft inherits a high degree of expressivity and controllability from the NeRF, while still being able to incorporate functional constraints through domain-specific objectives.
Paper Structure (15 sections, 7 equations, 10 figures, 6 tables)

This paper contains 15 sections, 7 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Comparison of unconstrained NeRF (left), DreamCraft neural render, and DreamCraft in-game generation for a Planet Minecraft caption. Despite its lower resolution, DreamCraft's generated structures are similar in quality to those of the unconstrained NeRF, both closely matching the corresponding textual description.
  • Figure 2: DreamCraft neural rendered generations for a set of Planet Minecraft captions.
  • Figure 3: DreamCraft produces a distribution of discrete blocks over a grid. Unquantized or "soft" versions of this distribution can be visualized for text-image guidance, while a fully discrete representation can be used to determine functional constraints, and exported to an in-game structure.
  • Figure 4: From 2D textures to 3D structures Once discretized, block types are mapped to voxel grids corresponding to the appearance of objects in-game. Here, we assemble each block into a 3D voxel grid using 2D textures from Minecraft. We approximate solid blocks by repeating the surface texture in the space inside the block.
  • Figure 5: "medium medieval home" with block grids of various widths.
  • ...and 5 more figures