Table of Contents
Fetching ...

Guiding Reinforcement Learning Exploration Using Natural Language

Brent Harrison, Upol Ehsan, Mark O. Riedl

TL;DR

This work tackles the challenge of reinforcing learning generalization to unseen environments by leveraging natural language as a generalizable source of action advice. It combines encoder-decoder language models with a modified policy shaping framework, enabling offline language-based critique to guide exploration during online learning. Empirical results in Frogger show faster convergence and better performance for language-based critique agents, including under noisy NL conditions, compared to Q-learning and observation-based critique. The approach highlights a practical path toward reducing human effort via offline NL supervision and suggests potential extensions to more complex, cross-domain tasks.

Abstract

In this work we present a technique to use natural language to help reinforcement learning generalize to unseen environments. This technique uses neural machine translation, specifically the use of encoder-decoder networks, to learn associations between natural language behavior descriptions and state-action information. We then use this learned model to guide agent exploration using a modified version of policy shaping to make it more effective at learning in unseen environments. We evaluate this technique using the popular arcade game, Frogger, under ideal and non-ideal conditions. This evaluation shows that our modified policy shaping algorithm improves over a Q-learning agent as well as a baseline version of policy shaping.

Guiding Reinforcement Learning Exploration Using Natural Language

TL;DR

This work tackles the challenge of reinforcing learning generalization to unseen environments by leveraging natural language as a generalizable source of action advice. It combines encoder-decoder language models with a modified policy shaping framework, enabling offline language-based critique to guide exploration during online learning. Empirical results in Frogger show faster convergence and better performance for language-based critique agents, including under noisy NL conditions, compared to Q-learning and observation-based critique. The approach highlights a practical path toward reducing human effort via offline NL supervision and suggests potential extensions to more complex, cross-domain tasks.

Abstract

In this work we present a technique to use natural language to help reinforcement learning generalize to unseen environments. This technique uses neural machine translation, specifically the use of encoder-decoder networks, to learn associations between natural language behavior descriptions and state-action information. We then use this learned model to guide agent exploration using a modified version of policy shaping to make it more effective at learning in unseen environments. We evaluate this technique using the popular arcade game, Frogger, under ideal and non-ideal conditions. This evaluation shows that our modified policy shaping algorithm improves over a Q-learning agent as well as a baseline version of policy shaping.

Paper Structure

This paper contains 20 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: High-level flowchart of our technique.
  • Figure 2: (a) The Frogger map used for training. (b) The 25% map used for testing. (c) The 50% map used for testing. (d) The 75% map used for testing.
  • Figure 3: Learning rates for agents on deterministic versions of (a) the 25% map,(b) the 50% map, and (c) the 75% map.
  • Figure 4: Learning rates for agents on stochastic versions of the 25% map (a), 50% map (b), and 75% map(c).