Table of Contents
Fetching ...

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

Shreyas Basavatia, Keerthiram Murugesan, Shivam Ratnakar

TL;DR

STARLING presents a scalable, self-supervised approach to training text-based reinforcement learning agents by automatically generating diverse training games with large language models (GPT-3) and an interactive fiction engine (Inform7). A structured, slot-filled prompting pipeline (with k-shot examples) produces JSON game specifications that are compiled into playable Glulx games, enabling pretraining on skills like cooking and cleaning before transfer to target environments. Across TextWorld Commonsense, ScienceWorld, and Zork1, a GRU-based TBRL agent pretrained with STARLING consistently outperforms vanilla training and approaches human performance on several tasks, illustrating improved generalization and robustness. The work highlights the potential of LLM-generated auxiliary environments as a scalable sandbox for advancing self-supervised text-based RL, while acknowledging limitations in navigation-heavy tasks and the need for end-to-end automation in future work.

Abstract

Interactive fiction games have emerged as an important application to improve the generalization capabilities of language-based reinforcement learning (RL) agents. Existing environments for interactive fiction games are domain-specific or time-consuming to generate and do not train the RL agents to master a specific set of skills. In this work, we introduce an interactive environment for self-supervised RL, STARLING, for text-based games that bootstraps the text-based RL agents with automatically generated games (based on the seed set of game ideas) to boost the performance and generalization capabilities to reach a goal of the target environment. These games let the agent hone their skills on a predefined set of tasks. We create and test an environment with 100 games, generated using this automated framework that uses large language models (GPT-3) and an interactive fiction game engine (based on Inform7) to provide the user with the ability to generate more games under minimal human supervision. Experimental results based on both the human participants and baseline text-based RL agents reveal that current state-of-the-art text-based RL agents cannot use previously learned skills in new situations at the level humans can. These results enforce STARLING's potential to serve as a sandbox environment for further research in self-supervised text-based RL.

STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models

TL;DR

STARLING presents a scalable, self-supervised approach to training text-based reinforcement learning agents by automatically generating diverse training games with large language models (GPT-3) and an interactive fiction engine (Inform7). A structured, slot-filled prompting pipeline (with k-shot examples) produces JSON game specifications that are compiled into playable Glulx games, enabling pretraining on skills like cooking and cleaning before transfer to target environments. Across TextWorld Commonsense, ScienceWorld, and Zork1, a GRU-based TBRL agent pretrained with STARLING consistently outperforms vanilla training and approaches human performance on several tasks, illustrating improved generalization and robustness. The work highlights the potential of LLM-generated auxiliary environments as a scalable sandbox for advancing self-supervised text-based RL, while acknowledging limitations in navigation-heavy tasks and the need for end-to-end automation in future work.

Abstract

Interactive fiction games have emerged as an important application to improve the generalization capabilities of language-based reinforcement learning (RL) agents. Existing environments for interactive fiction games are domain-specific or time-consuming to generate and do not train the RL agents to master a specific set of skills. In this work, we introduce an interactive environment for self-supervised RL, STARLING, for text-based games that bootstraps the text-based RL agents with automatically generated games (based on the seed set of game ideas) to boost the performance and generalization capabilities to reach a goal of the target environment. These games let the agent hone their skills on a predefined set of tasks. We create and test an environment with 100 games, generated using this automated framework that uses large language models (GPT-3) and an interactive fiction game engine (based on Inform7) to provide the user with the ability to generate more games under minimal human supervision. Experimental results based on both the human participants and baseline text-based RL agents reveal that current state-of-the-art text-based RL agents cannot use previously learned skills in new situations at the level humans can. These results enforce STARLING's potential to serve as a sandbox environment for further research in self-supervised text-based RL.
Paper Structure (26 sections, 13 figures, 3 tables)

This paper contains 26 sections, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Architecture diagram for Self-supervised Text-based Reinforcement Learning using LLM (STARLING).
  • Figure 2: Workflow of the STARLING Game Generator using large language model (GPT3).
  • Figure 3: (A) GPT3 input prompt for cooking games with one action example. The actual prompt contains four action examples. (B) GPT3 output for cooking pasta game idea. GPT3 reliably outputs accurate and necessary game information very similar to the input.
  • Figure 4: Training curves for pre-training step of STARLING depicting the normalized scores (left) and number of moves taken (right) of text-based reinforcement learning agents.
  • Figure 5: Training curves for TWC easy (left), medium (middle), and hard (right) games depicting the normalized scores (top) and number of moves (bottom) of both vanilla TBRL and STARLING agents.
  • ...and 8 more figures