Table of Contents
Fetching ...

Will GPT-4 Run DOOM?

Adrian de Wynter

TL;DR

It is found that GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing, but more complex prompting strategies involving multiple model calls provide better results.

Abstract

We show that GPT-4's reasoning and planning capabilities extend to the 1993 first-person shooter Doom. This large language model (LLM) is able to run and play the game with only a few instructions, plus a textual description--generated by the model itself from screenshots--about the state of the game being observed. We find that GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. More complex prompting strategies involving multiple model calls provide better results. While further work is required to enable the LLM to play the game as well as its classical, reinforcement learning-based counterparts, we note that GPT-4 required no training, leaning instead on its own reasoning and observational capabilities. We hope our work pushes the boundaries on intelligent, LLM-based agents in video games. We conclude by discussing the ethical implications of our work.

Will GPT-4 Run DOOM?

TL;DR

It is found that GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing, but more complex prompting strategies involving multiple model calls provide better results.

Abstract

We show that GPT-4's reasoning and planning capabilities extend to the 1993 first-person shooter Doom. This large language model (LLM) is able to run and play the game with only a few instructions, plus a textual description--generated by the model itself from screenshots--about the state of the game being observed. We find that GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. More complex prompting strategies involving multiple model calls provide better results. While further work is required to enable the LLM to play the game as well as its classical, reinforcement learning-based counterparts, we note that GPT-4 required no training, leaning instead on its own reasoning and observational capabilities. We hope our work pushes the boundaries on intelligent, LLM-based agents in video games. We conclude by discussing the ethical implications of our work.
Paper Structure (24 sections, 1 equation, 8 figures, 1 table)

This paper contains 24 sections, 1 equation, 8 figures, 1 table.

Figures (8)

  • Figure 1: Architecture of our system. The core piece of our code, the Manager, is a Matplotlib interface able to run Doom on top of the Python binding, itself an interface for the original Doom code in C. The Manager communicates states and actions to the binding; sends screenshots to Vision, and retains and parses the history of previous moves by Agent, Planner and Experts (if applicable). The three boxes above describe our prompt setup: Vision feeds descriptions to Planner and Agent, and an extra set of calls is performed to obtain Experts.
  • Figure 2: Sample screenshot fed into Vision, taken directly from room C in E1M1. The output from the model is in Figure \ref{['fig:visionoutput']}.
  • Figure 3: Vision output corresponding to the screenshot from Figure \ref{['fig:VisionSample']}. The model was requested to return a description of the room and the UI, with the UI as a list. This structured output allowed us to maintain consistency between calls. Remark that the model was not informed about either the hazards of the pools, or the meaning of the ammo counters. The prompt is in the Appendix.
  • Figure 4: Map evaluated in this paper, corresponding to Episode 1, Map 1 (E1M1, "Hangar") of Doom DoomWiki. The trajectory from our walkthrough follows the sequence A, B, C, D; starting at A (leftmost checkered flag) and ending in D (rightmost checkered flag). Depicted is a typical human run with no exploration. The player must flip a switch in D to end the game. Doors are denoted by an orange line. The zig-zag room (C) contains pools of acid (not pictured) that damage the player if walked on.
  • Figure 5: Trajectories by Agent. The starting point is at the leftmost checkered flag; the ending (a switch that must be activated) is the rightmost checkered flag. Other areas (e.g., G, J) are secrets. Time-outs are denoted by ✚ and deaths by ✖. Clockwise from the top left: naïve, with walkthrough, with k-levels, and with planner. The naïve prompt could not leave the first room and got frequently stuck. It died twice at the hand of the zombies in room B; once due to shooting an explosive barrel. The walkthrough prompt made it to the second room, but got stuck often in the corners and was killed by the zombies. The planning prompt was able to kill the zombies in room B, and almost finished the map. K-levels was consistently able to reach room C, although died by stepping on the acid. It once managed to open a secret room, but could not backtrack and timed out.
  • ...and 3 more figures