How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds
Prithviraj Ammanabrolu, Ethan Tien, Matthew Hausknecht, Mark O. Riedl
TL;DR
This work introduces Q*BERT, a QA-driven knowledge-graph agent for text-based games, and extends it with MC!Q*BERT, which adds intrinsic motivation and modular policy chaining to detect and overcome bottlenecks in sparse-reward environments. By building a causal dependency graph of game states and leveraging graph-guided exploration, the authors achieve state-of-the-art results across nine games and demonstrate the ability to bypass challenging bottlenecks like the Grue in Zork. The combination of knowledge-graph state representation, structured exploration, and backtracking policy chains substantially improves sample efficiency and robust long-horizon planning in text worlds. These techniques have potential applications in complex, language-driven planning and dialogue systems, with caveats related to stochastic environments and broader safety considerations.
Abstract
Text-based games are long puzzles or quests, characterized by a sequence of sparse and potentially deceptive rewards. They provide an ideal platform to develop agents that perceive and act upon the world using a combinatorially sized natural language state-action space. Standard Reinforcement Learning agents are poorly equipped to effectively explore such spaces and often struggle to overcome bottlenecks---states that agents are unable to pass through simply because they do not see the right action sequence enough times to be sufficiently reinforced. We introduce Q*BERT, an agent that learns to build a knowledge graph of the world by answering questions, which leads to greater sample efficiency. To overcome bottlenecks, we further introduce MC!Q*BERT an agent that uses an knowledge-graph-based intrinsic motivation to detect bottlenecks and a novel exploration strategy to efficiently learn a chain of policy modules to overcome them. We present an ablation study and results demonstrating how our method outperforms the current state-of-the-art on nine text games, including the popular game, Zork, where, for the first time, a learning agent gets past the bottleneck where the player is eaten by a Grue.
