Meta-Learning and Meta-Reinforcement Learning - Tracing the Path towards DeepMind's Adaptive Agent

Björn Hoppmann; Christoph Scholz

Meta-Learning and Meta-Reinforcement Learning - Tracing the Path towards DeepMind's Adaptive Agent

Björn Hoppmann, Christoph Scholz

TL;DR

The paper addresses rapid adaptation to novel tasks by providing a rigorous task-based formalism for meta-learning and meta-RL and by tracing a timeline of landmark algorithms from MAML to DeepMind's Adaptive Agent (ADA). It unifies these approaches under a two-stage framework: inner task-specific learning guided by a meta-parameter $\varphi$ and outer meta-optimization via a meta-loss, with explicit definitions for performance measures like $\mathcal{L}_{gen}^{meta}$ and adaptation speed. Its main contributions are the formal derivations of meta-learning paradigms, the comprehensive timeline of gradient-based, memory-based, and transformer-based meta-RL methods, and the analysis of techniques such as distillation and automated curriculum learning that scale to generalist agents. The work highlights the shift toward large-scale foundation-model-like meta-RL systems and discusses practical implications, benchmarking challenges, and ethical considerations as these agents move toward real-world deployment.

Abstract

Humans are highly effective at utilizing prior knowledge to adapt to novel tasks, a capability that standard machine learning models struggle to replicate due to their reliance on task-specific training. Meta-learning overcomes this limitation by allowing models to acquire transferable knowledge from various tasks, enabling rapid adaptation to new challenges with minimal data. This survey provides a rigorous, task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating the essential concepts needed to understand the Adaptive Agent and other generalist approaches.

Meta-Learning and Meta-Reinforcement Learning - Tracing the Path towards DeepMind's Adaptive Agent

TL;DR

and outer meta-optimization via a meta-loss, with explicit definitions for performance measures like

and adaptation speed. Its main contributions are the formal derivations of meta-learning paradigms, the comprehensive timeline of gradient-based, memory-based, and transformer-based meta-RL methods, and the analysis of techniques such as distillation and automated curriculum learning that scale to generalist agents. The work highlights the shift toward large-scale foundation-model-like meta-RL systems and discusses practical implications, benchmarking challenges, and ethical considerations as these agents move toward real-world deployment.

Abstract

Paper Structure (24 sections, 26 equations, 7 figures, 1 table)

This paper contains 24 sections, 26 equations, 7 figures, 1 table.

Introduction
Paradigm
Meta-Learning
Meta-Reinforcement Learning
Performance Measures
The Timeline of meta-Reinforcement Learning Landmarks
Gradient-based Meta-Learning
Memory-based Meta-RL
Task-Inference Meta-RL
Transformer-based Meta-RL
The Adaptive Agent
Automated Curriculum Learning:
Distillation
Discussion
Conclusion
...and 9 more sections

Figures (7)

Figure 1: Meta-learning of 2-way 1-shot animal classification tasks. The current meta-knowledge $\varphi$ is the prior for one-shot learning of each particular classification task. During meta-training, the meta-optimizer receives all $N$ query set losses of the adapted models to update meta-knowledge $\varphi$. Meta-validation evaluates the training progress on new classification problems every $l$ meta-epochs, while meta-testing on unseen classifications takes place after meta-training.
Figure 2: General Meta-Training. In each iteration a new task $T_i$ is sampled from the family $p(T)$. The meta-variable $\varphi$ is the prior for individual $K$ shot fine-tuning of each task. The resulting parameters $\theta_i'$ of each task are used to update $\varphi$.
Figure 3: General Meta-Testing Paradigm. For each $T_j$ sampled from the set of test tasks the parameters $\theta_j(\varphi)$ are fine-tuned in $K$ shots, before the resulting $\theta_j'$ get evaluated on the task's test set via the task-specific loss $\mathcal{L}_j$ to yield the performance.
Figure 4: Meta-Reinforcement learning to race on tracks with varying weather conditions. Starting from the meta-knowledge $\varphi$, $K$ episodes of fine-tuning on the particular racing track yield performance measures. The meta-optimizer uses these measures to update meta-knowledge $\varphi$. Meta-validation evaluates the training progress on unseen racing tracks every $l$ meta-epochs. After meta-training, the meta-policy adapts to test tracks to evaluate the quality of prior $\varphi$.
Figure 5: The MAML meta-training scheme.
...and 2 more figures

Meta-Learning and Meta-Reinforcement Learning - Tracing the Path towards DeepMind's Adaptive Agent

TL;DR

Abstract

Meta-Learning and Meta-Reinforcement Learning - Tracing the Path towards DeepMind's Adaptive Agent

Authors

TL;DR

Abstract

Table of Contents

Figures (7)