Large Language Models as Minecraft Agents

Chris Madge; Massimo Poesio

Large Language Models as Minecraft Agents

Chris Madge, Massimo Poesio

TL;DR

The paper investigates using large language models as agents in a Minecraft Collaborative Builder Task, focusing on builder and architect roles and enabling clarification questions. It introduces a web-based platform for online interaction and benchmarks GPT-4/3.5, Llama2, and Vicuna against IGLU baselines, showing that GPT-4/3.5 achieve strong builder performance while smaller models struggle to adhere to structured JSON outputs. The study demonstrates the feasibility of LLM-based embodied agents in a voxel world, with explicit mechanisms for asking and answering clarifications, and discusses challenges for architect-level tasks. The results suggest that with prompting and potential fine-tuning, LLMs can serve as effective agents for structured construction tasks, enabling scalable evaluation and future development.

Abstract

In this work we examine the use of Large Language Models (LLMs) in the challenging setting of acting as a Minecraft agent. We apply and evaluate LLMs in the builder and architect settings, introduce clarification questions and examining the challenges and opportunities for improvement. In addition, we present a platform for online interaction with the agents and an evaluation against previous works.

Large Language Models as Minecraft Agents

TL;DR

Abstract

Paper Structure (19 sections, 4 figures, 1 table)

This paper contains 19 sections, 4 figures, 1 table.

Introduction
Related Work
The Original Architect and Builder Models for the Minecraft Task
IGLU
Using LLMs to generate Embodied AI in an Interactive Environment
Generating and Responding to Clarification Questions
Methodology
The Builder
The Architect
The LLMs
Results
Builder
Architect
Conclusion
Transcripts
...and 4 more sections

Figures (4)

Figure 1: Builder clarification and reference (GPT-4)
Figure 2: Builder complex structures (GPT-4)
Figure 3: Architect - Simple Structure (GPT-4)
Figure 4: Architect - Complex Structure (GPT-4)

Large Language Models as Minecraft Agents

TL;DR

Abstract

Large Language Models as Minecraft Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (4)