Table of Contents
Fetching ...

Large Language Models as Minecraft Agents

Chris Madge, Massimo Poesio

TL;DR

The paper investigates using large language models as agents in a Minecraft Collaborative Builder Task, focusing on builder and architect roles and enabling clarification questions. It introduces a web-based platform for online interaction and benchmarks GPT-4/3.5, Llama2, and Vicuna against IGLU baselines, showing that GPT-4/3.5 achieve strong builder performance while smaller models struggle to adhere to structured JSON outputs. The study demonstrates the feasibility of LLM-based embodied agents in a voxel world, with explicit mechanisms for asking and answering clarifications, and discusses challenges for architect-level tasks. The results suggest that with prompting and potential fine-tuning, LLMs can serve as effective agents for structured construction tasks, enabling scalable evaluation and future development.

Abstract

In this work we examine the use of Large Language Models (LLMs) in the challenging setting of acting as a Minecraft agent. We apply and evaluate LLMs in the builder and architect settings, introduce clarification questions and examining the challenges and opportunities for improvement. In addition, we present a platform for online interaction with the agents and an evaluation against previous works.

Large Language Models as Minecraft Agents

TL;DR

The paper investigates using large language models as agents in a Minecraft Collaborative Builder Task, focusing on builder and architect roles and enabling clarification questions. It introduces a web-based platform for online interaction and benchmarks GPT-4/3.5, Llama2, and Vicuna against IGLU baselines, showing that GPT-4/3.5 achieve strong builder performance while smaller models struggle to adhere to structured JSON outputs. The study demonstrates the feasibility of LLM-based embodied agents in a voxel world, with explicit mechanisms for asking and answering clarifications, and discusses challenges for architect-level tasks. The results suggest that with prompting and potential fine-tuning, LLMs can serve as effective agents for structured construction tasks, enabling scalable evaluation and future development.

Abstract

In this work we examine the use of Large Language Models (LLMs) in the challenging setting of acting as a Minecraft agent. We apply and evaluate LLMs in the builder and architect settings, introduce clarification questions and examining the challenges and opportunities for improvement. In addition, we present a platform for online interaction with the agents and an evaluation against previous works.
Paper Structure (19 sections, 4 figures, 1 table)

This paper contains 19 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Builder clarification and reference (GPT-4)
  • Figure 2: Builder complex structures (GPT-4)
  • Figure 3: Architect - Simple Structure (GPT-4)
  • Figure 4: Architect - Complex Structure (GPT-4)