Table of Contents
Fetching ...

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

Harris Chan, Yuhuai Wu, Jamie Kiros, Sanja Fidler, Jimmy Ba

TL;DR

ACTRCE addresses sparse rewards by replacing fixed state-space goals with natural language goals described by a teacher. It extends Hindsight Experience Replay by relabeling episodes using language-described goals and using hindsight advice as additional reward signals. Empirical results on KrazyGrid World and ViZDoom show faster learning, strong compositional task performance, and zero-shot generalization to unseen lexicons with pre-trained embeddings. The findings highlight the practicality of language-grounded goals and minimal teacher feedback for robust multi-goal RL.

Abstract

Sparse reward is one of the most challenging problems in reinforcement learning (RL). Hindsight Experience Replay (HER) attempts to address this issue by converting a failed experience to a successful one by relabeling the goals. Despite its effectiveness, HER has limited applicability because it lacks a compact and universal goal representation. We present Augmenting experienCe via TeacheR's adviCE (ACTRCE), an efficient reinforcement learning technique that extends the HER framework using natural language as the goal representation. We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn. We also show that with language goal representations, the agent can generalize to unseen instructions, and even generalize to instructions with unseen lexicons. We further demonstrate it is crucial to use hindsight advice to solve challenging tasks, and even small amount of advice is sufficient for the agent to achieve good performance.

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

TL;DR

ACTRCE addresses sparse rewards by replacing fixed state-space goals with natural language goals described by a teacher. It extends Hindsight Experience Replay by relabeling episodes using language-described goals and using hindsight advice as additional reward signals. Empirical results on KrazyGrid World and ViZDoom show faster learning, strong compositional task performance, and zero-shot generalization to unseen lexicons with pre-trained embeddings. The findings highlight the practicality of language-grounded goals and minimal teacher feedback for robust multi-goal RL.

Abstract

Sparse reward is one of the most challenging problems in reinforcement learning (RL). Hindsight Experience Replay (HER) attempts to address this issue by converting a failed experience to a successful one by relabeling the goals. Despite its effectiveness, HER has limited applicability because it lacks a compact and universal goal representation. We present Augmenting experienCe via TeacheR's adviCE (ACTRCE), an efficient reinforcement learning technique that extends the HER framework using natural language as the goal representation. We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn. We also show that with language goal representations, the agent can generalize to unseen instructions, and even generalize to instructions with unseen lexicons. We further demonstrate it is crucial to use hindsight advice to solve challenging tasks, and even small amount of advice is sufficient for the agent to achieve good performance.

Paper Structure

This paper contains 50 sections, 3 equations, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: The diagram illustrates our model architecture.
  • Figure 2: Performance in average success rate during training, comparing between different sentence embedding methods for the single target and composition task on ViZDoom.
  • Figure 3: Comparison among different sentence embedding methods showing the pairwise correlation between the sentence embedding vectors for each of the singleton instructions. The darker the colour, the higher the correlation.
  • Figure 4: Performance comparisons on KGW and ViZDoom environments. The success rates are calculated over all desired goals and 16 different environments. Shaded area represents standard deviation 2 random seeds.
  • Figure 5: ViZDoom experiment with 5 objects in easy mode for single target case.
  • ...and 12 more figures