Table of Contents
Fetching ...

Exploration Based Language Learning for Text-Based Games

Andrea Madotto, Mahdi Namazifar, Joost Huizinga, Piero Molino, Adrien Ecoffet, Huaixiu Zheng, Alexandros Papangelis, Dian Yu, Chandra Khatri, Gokhan Tur

TL;DR

This work addresses the challenge of solving text-based games with huge action spaces by decoupling exploration from policy learning. It adapts the Go-Explore framework to extract high-reward trajectories in text domains and trains a Seq2Seq imitation learner to map observations to actions from those trajectories. The results show that Go-Explore dramatically improves exploration efficiency (e.g., faster discovery of optimal trajectories) and that Seq2Seq imitation from Go-Explore trajectories generalizes better to unseen CookingWorld games, including in zero-shot settings. The study also discusses limitations in state representations and suggests future directions such as hierarchical encoders and language-model-assisted action generation to further improve generalization and scalability.

Abstract

This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, and language generation by artificial agents. Moreover, they provide a learning environment in which these skills can be acquired through interactions with an environment rather than using fixed corpora. One aspect that makes these games particularly challenging for learning agents is the combinatorially large action space. Existing methods for solving text-based games are limited to games that are either very simple or have an action space restricted to a predetermined set of admissible actions. In this work, we propose to use the exploration approach of Go-Explore for solving text-based games. More specifically, in an initial exploration phase, we first extract trajectories with high rewards, after which we train a policy to solve the game by imitating these trajectories. Our experiments show that this approach outperforms existing solutions in solving text-based games, and it is more sample efficient in terms of the number of interactions with the environment. Moreover, we show that the learned policy can generalize better than existing solutions to unseen games without using any restriction on the action space.

Exploration Based Language Learning for Text-Based Games

TL;DR

This work addresses the challenge of solving text-based games with huge action spaces by decoupling exploration from policy learning. It adapts the Go-Explore framework to extract high-reward trajectories in text domains and trains a Seq2Seq imitation learner to map observations to actions from those trajectories. The results show that Go-Explore dramatically improves exploration efficiency (e.g., faster discovery of optimal trajectories) and that Seq2Seq imitation from Go-Explore trajectories generalizes better to unseen CookingWorld games, including in zero-shot settings. The study also discusses limitations in state representations and suggests future directions such as hierarchical encoders and language-model-assisted action generation to further improve generalization and scalability.

Abstract

This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, and language generation by artificial agents. Moreover, they provide a learning environment in which these skills can be acquired through interactions with an environment rather than using fixed corpora. One aspect that makes these games particularly challenging for learning agents is the combinatorially large action space. Existing methods for solving text-based games are limited to games that are either very simple or have an action space restricted to a predetermined set of admissible actions. In this work, we propose to use the exploration approach of Go-Explore for solving text-based games. More specifically, in an initial exploration phase, we first extract trajectories with high rewards, after which we train a policy to solve the game by imitating these trajectories. Our experiments show that this approach outperforms existing solutions in solving text-based games, and it is more sample efficient in terms of the number of interactions with the environment. Moreover, we show that the learned policy can generalize better than existing solutions to unseen games without using any restriction on the action space.

Paper Structure

This paper contains 27 sections, 3 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: CoinCollector results of DQN++ and DRQN++ yuan2018counting versus Go-Explore Phase1, i.e. just exploration.
  • Figure 2: Breakdown of results for the CookingWorld games in the Joint setting. The results are normalized and sorted by increasing difficulty level from left to right, averaged among the 20 games of each level.
  • Figure 3: High level intuition of the Go-Exlore algorithm. Figure taken from ecoffet2019go with permission.
  • Figure 4: LSTM-DQN high level schema.
  • Figure 5: DRRN high level schema.
  • ...and 6 more figures