A Tutorial on Meta-Reinforcement Learning
Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson
TL;DR
The paper surveys meta-reinforcement learning (meta-RL), addressing how to learn RL algorithms themselves to achieve faster adaptation across task distributions. It systematically categorizes problem settings into few-shot and many-shot regimes, multi-task vs single-task scenarios, and surveys three core inner-loop parameterizations (parameterized policy gradients, black-box sequence models, and task inference). Canonical methods like MAML and RL^2 are introduced, along with extensions covering exploration strategies, supervision regimes, and model-based variants, as well as theoretical analyses via Bayes-adaptive and other frameworks. The authors discuss applications in robotics and multi-agent RL, and identify open problems including generalization to broader task distributions, benchmarks, and the integration of offline data. The goal is to guide practitioners and researchers toward robust, generalizable meta-RL methods and to chart directions for future work.
Abstract
While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible. In this survey, we describe the meta-RL problem setting in detail as well as its major variations. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, we then survey meta-RL algorithms and applications. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
