Large Language Models as Minecraft Agents
Chris Madge, Massimo Poesio
TL;DR
The paper investigates using large language models as agents in a Minecraft Collaborative Builder Task, focusing on builder and architect roles and enabling clarification questions. It introduces a web-based platform for online interaction and benchmarks GPT-4/3.5, Llama2, and Vicuna against IGLU baselines, showing that GPT-4/3.5 achieve strong builder performance while smaller models struggle to adhere to structured JSON outputs. The study demonstrates the feasibility of LLM-based embodied agents in a voxel world, with explicit mechanisms for asking and answering clarifications, and discusses challenges for architect-level tasks. The results suggest that with prompting and potential fine-tuning, LLMs can serve as effective agents for structured construction tasks, enabling scalable evaluation and future development.
Abstract
In this work we examine the use of Large Language Models (LLMs) in the challenging setting of acting as a Minecraft agent. We apply and evaluate LLMs in the builder and architect settings, introduce clarification questions and examining the challenges and opportunities for improvement. In addition, we present a platform for online interaction with the agents and an evaluation against previous works.
