Some Considerations on Learning to Explore via Meta-Reinforcement Learning
Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever
TL;DR
This work addresses how meta-reinforcement learning agents can optimize their own data sampling to improve fast adaptation across tasks. It introduces E-MAML and E-RL^2, algorithms that explicitly account for how initial task samples influence post-adaptation returns, and discusses flexible inner-update operators and practical training tricks. The Krazy World benchmark and maze experiments demonstrate that exploration-aware meta-learning can achieve faster initial gains and stronger final performance than standard MAML or RL^2, with results highlighting the importance of system identification and memory. The findings point to future work on curiosity signals and intrinsic rewards to further enhance long-horizon exploration in meta-learning.
Abstract
We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.
