Table of Contents
Fetching ...

Learning Dynamic Belief Graphs to Generalize on Text-Based Games

Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, William L. Hamilton

TL;DR

This work tackles learning and generalizing in text-based games by introducing GATA, a graph-aided transformer that learns latent, dynamic belief graphs from raw text and uses them to plan actions. The agent combines a graph updater, which maintains a continuous, multi-relational belief graph, with an action selector that fuses graph and text representations for decision making, trained via RL and self-supervised pretraining (OG and COC). Empirical results on 500+ TextWorld games show that GATA outperforms strong text-based baselines by an average of 24.2% and can approach the performance of agents with access to ground-truth graphs, illustrating the value of graph-structured representations for memory and planning under partial observability. The work also provides ablations with discrete and full-graph variants, probing analyses, and broader-impact considerations, highlighting both the potential and the challenges of graph-based reasoning in language-grounded RL.

Abstract

Playing text-based games requires skills in processing natural language and sequential decision making. Achieving human-level performance on text-based games remains an open challenge, and prior research has largely relied on hand-crafted structured representations and heuristics. In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured representations learned end-to-end from raw text. We propose a novel graph-aided transformer agent (GATA) that infers and updates latent belief graphs during planning to enable effective action selection by capturing the underlying game dynamics. GATA is trained using a combination of reinforcement and self-supervised learning. Our work demonstrates that the learned graph-based representations help agents converge to better policies than their text-only counterparts and facilitate effective generalization across game configurations. Experiments on 500+ unique games from the TextWorld suite show that our best agent outperforms text-based baselines by an average of 24.2%.

Learning Dynamic Belief Graphs to Generalize on Text-Based Games

TL;DR

This work tackles learning and generalizing in text-based games by introducing GATA, a graph-aided transformer that learns latent, dynamic belief graphs from raw text and uses them to plan actions. The agent combines a graph updater, which maintains a continuous, multi-relational belief graph, with an action selector that fuses graph and text representations for decision making, trained via RL and self-supervised pretraining (OG and COC). Empirical results on 500+ TextWorld games show that GATA outperforms strong text-based baselines by an average of 24.2% and can approach the performance of agents with access to ground-truth graphs, illustrating the value of graph-structured representations for memory and planning under partial observability. The work also provides ablations with discrete and full-graph variants, probing analyses, and broader-impact considerations, highlighting both the potential and the challenges of graph-based reasoning in language-grounded RL.

Abstract

Playing text-based games requires skills in processing natural language and sequential decision making. Achieving human-level performance on text-based games remains an open challenge, and prior research has largely relied on hand-crafted structured representations and heuristics. In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured representations learned end-to-end from raw text. We propose a novel graph-aided transformer agent (GATA) that infers and updates latent belief graphs during planning to enable effective action selection by capturing the underlying game dynamics. GATA is trained using a combination of reinforcement and self-supervised learning. Our work demonstrates that the learned graph-based representations help agents converge to better policies than their text-only counterparts and facilitate effective generalization across game configurations. Experiments on 500+ unique games from the TextWorld suite show that our best agent outperforms text-based baselines by an average of 24.2%.

Paper Structure

This paper contains 52 sections, 15 equations, 18 figures, 9 tables, 1 algorithm.

Figures (18)

  • Figure 1: GATA playing a text-based game by updating its belief graph. In response to action $A_{t-1}$, the environment returns text observation $O_t$. Based on $O_t$ and $\mathcal{G}_{t-1}$, the agent updates $\mathcal{G}_{t}$ and selects a new action $A_{t}$. In the figure, blue box with squares is the game engine, green box with diamonds is the graph updater, red box with slashes is the action selector.
  • Figure 2: GATA in detail. The coloring scheme is same as in Figure \ref{['fig:kg']}. The graph updater first generates $\Delta g_t$ using $\mathcal{G}_{t-1}$ and $O_t$. Afterwards the action selector uses $O_t$ and the updated graph $\mathcal{G}_t$ to select $A_t$ from the list of action candidates $C_t$. Purple dotted line indicates a detached connection (i.e., no back-propagation through such connection).
  • Figure 3: Left: Training curves on 20 level 2 games (averaged over 3 seeds). Right: Density comparison between a ground-truth graph (binary) and a belief graph $\mathcal{G}$ generated by the COC pre-training procedure. Both matrices are slices of adjacency tensors corresponding the is relation.
  • Figure 4: Observation generation model.
  • Figure 5: Contrastive observation classification model.
  • ...and 13 more figures