Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Bradly C. Stadie; Ge Yang; Rein Houthooft; Xi Chen; Yan Duan; Yuhuai Wu; Pieter Abbeel; Ilya Sutskever

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

TL;DR

This work addresses how meta-reinforcement learning agents can optimize their own data sampling to improve fast adaptation across tasks. It introduces E-MAML and E-RL^2, algorithms that explicitly account for how initial task samples influence post-adaptation returns, and discusses flexible inner-update operators and practical training tricks. The Krazy World benchmark and maze experiments demonstrate that exploration-aware meta-learning can achieve faster initial gains and stronger final performance than standard MAML or RL^2, with results highlighting the importance of system identification and memory. The findings point to future work on curiosity signals and intrinsic rewards to further enhance long-horizon exploration in meta-learning.

Abstract

We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

TL;DR

Abstract

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)