Table of Contents
Fetching ...

How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds

Prithviraj Ammanabrolu, Ethan Tien, Matthew Hausknecht, Mark O. Riedl

TL;DR

This work introduces Q*BERT, a QA-driven knowledge-graph agent for text-based games, and extends it with MC!Q*BERT, which adds intrinsic motivation and modular policy chaining to detect and overcome bottlenecks in sparse-reward environments. By building a causal dependency graph of game states and leveraging graph-guided exploration, the authors achieve state-of-the-art results across nine games and demonstrate the ability to bypass challenging bottlenecks like the Grue in Zork. The combination of knowledge-graph state representation, structured exploration, and backtracking policy chains substantially improves sample efficiency and robust long-horizon planning in text worlds. These techniques have potential applications in complex, language-driven planning and dialogue systems, with caveats related to stochastic environments and broader safety considerations.

Abstract

Text-based games are long puzzles or quests, characterized by a sequence of sparse and potentially deceptive rewards. They provide an ideal platform to develop agents that perceive and act upon the world using a combinatorially sized natural language state-action space. Standard Reinforcement Learning agents are poorly equipped to effectively explore such spaces and often struggle to overcome bottlenecks---states that agents are unable to pass through simply because they do not see the right action sequence enough times to be sufficiently reinforced. We introduce Q*BERT, an agent that learns to build a knowledge graph of the world by answering questions, which leads to greater sample efficiency. To overcome bottlenecks, we further introduce MC!Q*BERT an agent that uses an knowledge-graph-based intrinsic motivation to detect bottlenecks and a novel exploration strategy to efficiently learn a chain of policy modules to overcome them. We present an ablation study and results demonstrating how our method outperforms the current state-of-the-art on nine text games, including the popular game, Zork, where, for the first time, a learning agent gets past the bottleneck where the player is eaten by a Grue.

How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds

TL;DR

This work introduces Q*BERT, a QA-driven knowledge-graph agent for text-based games, and extends it with MC!Q*BERT, which adds intrinsic motivation and modular policy chaining to detect and overcome bottlenecks in sparse-reward environments. By building a causal dependency graph of game states and leveraging graph-guided exploration, the authors achieve state-of-the-art results across nine games and demonstrate the ability to bypass challenging bottlenecks like the Grue in Zork. The combination of knowledge-graph state representation, structured exploration, and backtracking policy chains substantially improves sample efficiency and robust long-horizon planning in text worlds. These techniques have potential applications in complex, language-driven planning and dialogue systems, with caveats related to stochastic environments and broader safety considerations.

Abstract

Text-based games are long puzzles or quests, characterized by a sequence of sparse and potentially deceptive rewards. They provide an ideal platform to develop agents that perceive and act upon the world using a combinatorially sized natural language state-action space. Standard Reinforcement Learning agents are poorly equipped to effectively explore such spaces and often struggle to overcome bottlenecks---states that agents are unable to pass through simply because they do not see the right action sequence enough times to be sufficiently reinforced. We introduce Q*BERT, an agent that learns to build a knowledge graph of the world by answering questions, which leads to greater sample efficiency. To overcome bottlenecks, we further introduce MC!Q*BERT an agent that uses an knowledge-graph-based intrinsic motivation to detect bottlenecks and a novel exploration strategy to efficiently learn a chain of policy modules to overcome them. We present an ablation study and results demonstrating how our method outperforms the current state-of-the-art on nine text games, including the popular game, Zork, where, for the first time, a learning agent gets past the bottleneck where the player is eaten by a Grue.

Paper Structure

This paper contains 27 sections, 8 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Excerpt from Zork1.
  • Figure 2: Portion of the Zork1 quest structure visualized as a directed acyclic graph. Each node represents a state; clouds represent areas of high branching factor with labels indicating some of the actions that must be performed to progress
  • Figure 3: One-step knowledge graph extraction in the Jericho-QA format, and overall Q*BERT architecture at time step $t$. At each step the ALBERT-QA model extracts a relevant highlighted entity set $V_t$ by answering questions based on the observation, which is used to update the knowledge graph.
  • Figure 4: Select ablation results on Zork1 conducted across 5 independent runs per experiment. We see where the agents using structured exploration pass each bottleneck seen in Fig. \ref{['fig:dag']}. Q*BERT without IM is unable to detect nor surpass bottlenecks beyond the Cellar.
  • Figure 5: Episode initial reward curves for KG-A2C and Q*BERT.
  • ...and 2 more figures