EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning

Kinjal Basu; Keerthiram Murugesan; Subhajit Chaudhury; Murray Campbell; Kartik Talamadupula; Tim Klinger

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning

Kinjal Basu, Keerthiram Murugesan, Subhajit Chaudhury, Murray Campbell, Kartik Talamadupula, Tim Klinger

TL;DR

EXPLORER is an exploration-guided reasoning agent for textual reinforcement learning that is neuro-symbolic in nature, as it relies on a neural module for exploration and a symbolic module for exploitation and can also learn generalized symbolic policies and perform well over unseen data.

Abstract

Text-based games (TBGs) have emerged as an important collection of NLP tasks, requiring reinforcement learning (RL) agents to combine natural language understanding with reasoning. A key challenge for agents attempting to solve such tasks is to generalize across multiple games and demonstrate good performance on both seen and unseen objects. Purely deep-RL-based approaches may perform well on seen objects; however, they fail to showcase the same performance on unseen objects. Commonsense-infused deep-RL agents may work better on unseen data; unfortunately, their policies are often not interpretable or easily transferable. To tackle these issues, in this paper, we present EXPLORER which is an exploration-guided reasoning agent for textual reinforcement learning. EXPLORER is neurosymbolic in nature, as it relies on a neural module for exploration and a symbolic module for exploitation. It can also learn generalized symbolic policies and perform well over unseen data. Our experiments show that EXPLORER outperforms the baseline agents on Text-World cooking (TW-Cooking) and Text-World Commonsense (TWC) games.

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning

TL;DR

Abstract

Paper Structure (13 sections, 2 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 2 equations, 8 figures, 2 tables, 1 algorithm.

Introduction
Background
Symbolic Policy Learner
Learning Symbolic Policy using ILP
Exception Learning
Rule Generalization
Dynamic Rule Generalization
Experiments and Results
Dataset
Experiments
Results
Related Work
Future Work and Conclusion

Figures (8)

Figure 1: An overview of the EXPLORER agent's dataflow on a TWC game. In EXPLORER, the neural module is responsible for exploration and collects <action, state, reward> pairs, whereas the symbolic module learns the rules and does the exploitation using commonsense knowledge from WordNet.
Figure 2: Overview of EXPLORER's decision-making at any given time step. The Hybrid Neuro-Symbolic architecture mainly consists of 5 modules - (a) Context Encoder encodes the observation to dynamic context, (b) Action Encoder encodes the admissible actions, (c) Neural Action Selector combines (a) and (b) with $\bigoplus$ operator, (d) Symbolic Action Selector returns a set of candidate actions, and (e) Symbolic Rule Learner uses ILP and WordNet-based rule generalization to generate symbolic rules.
Figure 3: Entity extraction using Action Template
Figure 4: ILP Rule Learning Example
Figure 5: Example of Rule Generalization
...and 3 more figures

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning

TL;DR

Abstract

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)