Guiding Reinforcement Learning Exploration Using Natural Language
Brent Harrison, Upol Ehsan, Mark O. Riedl
TL;DR
This work tackles the challenge of reinforcing learning generalization to unseen environments by leveraging natural language as a generalizable source of action advice. It combines encoder-decoder language models with a modified policy shaping framework, enabling offline language-based critique to guide exploration during online learning. Empirical results in Frogger show faster convergence and better performance for language-based critique agents, including under noisy NL conditions, compared to Q-learning and observation-based critique. The approach highlights a practical path toward reducing human effort via offline NL supervision and suggests potential extensions to more complex, cross-domain tasks.
Abstract
In this work we present a technique to use natural language to help reinforcement learning generalize to unseen environments. This technique uses neural machine translation, specifically the use of encoder-decoder networks, to learn associations between natural language behavior descriptions and state-action information. We then use this learned model to guide agent exploration using a modified version of policy shaping to make it more effective at learning in unseen environments. We evaluate this technique using the popular arcade game, Frogger, under ideal and non-ideal conditions. This evaluation shows that our modified policy shaping algorithm improves over a Q-learning agent as well as a baseline version of policy shaping.
