DreamCraft: Text-Guided Generation of Functional 3D Environments in Minecraft
Sam Earle, Filippos Kokkinos, Yuhe Nie, Julian Togelius, Roberta Raileanu
TL;DR
DreamCraft addresses the challenge of generating functional 3D game environments from natural language prompts by learning a quantized NeRF that outputs discrete Minecraft block layouts aligned with descriptions. It combines text-guided neural rendering with differentiable functional constraints, enabling distributional and adjacency control over block types. Compared to post-processed baselines, DreamCraft achieves higher in-game fidelity and prompt alignment, particularly on domain-specific prompts, while maintaining expressive control via NeRF-like representations. The approach demonstrates a first step toward democratizing flexible yet functional content creation for game design and RL environment generation, though it currently requires hours per structure and faces opportunities for speedups and enhanced lighting modeling.
Abstract
Procedural Content Generation (PCG) algorithms enable the automatic generation of complex and diverse artifacts. However, they don't provide high-level control over the generated content and typically require domain expertise. In contrast, text-to-3D methods allow users to specify desired characteristics in natural language, offering a high amount of flexibility and expressivity. But unlike PCG, such approaches cannot guarantee functionality, which is crucial for certain applications like game design. In this paper, we present a method for generating functional 3D artifacts from free-form text prompts in the open-world game Minecraft. Our method, DreamCraft, trains quantized Neural Radiance Fields (NeRFs) to represent artifacts that, when viewed in-game, match given text descriptions. We find that DreamCraft produces more aligned in-game artifacts than a baseline that post-processes the output of an unconstrained NeRF. Thanks to the quantized representation of the environment, functional constraints can be integrated using specialized loss terms. We show how this can be leveraged to generate 3D structures that match a target distribution or obey certain adjacency rules over the block types. DreamCraft inherits a high degree of expressivity and controllability from the NeRF, while still being able to incorporate functional constraints through domain-specific objectives.
