Hierarchical Transformers are Efficient Meta-Reinforcement Learners

Gresa Shala; André Biedenkapp; Josif Grabocka

Hierarchical Transformers are Efficient Meta-Reinforcement Learners

Gresa Shala, André Biedenkapp, Josif Grabocka

TL;DR

This work introduces Hierarchical Transformers for Meta-Reinforcement Learning (HTrMRL), a powerful online meta-reinforcement learning approach that enhances the agent's ability to generalize from limited data but also paves the way for more robust and versatile AI systems.

Abstract

We introduce Hierarchical Transformers for Meta-Reinforcement Learning (HTrMRL), a powerful online meta-reinforcement learning approach. HTrMRL aims to address the challenge of enabling reinforcement learning agents to perform effectively in previously unseen tasks. We demonstrate how past episodes serve as a rich source of information, which our model effectively distills and applies to new contexts. Our learned algorithm is capable of outperforming the previous state-of-the-art and provides more efficient meta-training while significantly improving generalization capabilities. Experimental results, obtained across various simulated tasks of the Meta-World Benchmark, indicate a significant improvement in learning efficiency and adaptability compared to the state-of-the-art on a variety of tasks. Our approach not only enhances the agent's ability to generalize from limited data but also paves the way for more robust and versatile AI systems.

Hierarchical Transformers are Efficient Meta-Reinforcement Learners

TL;DR

Abstract

Paper Structure (23 sections, 4 equations, 25 figures, 1 table)

This paper contains 23 sections, 4 equations, 25 figures, 1 table.

Introduction
Related Work
Preliminaries
Deeper Memory Mechanisms in Transformer-Based Meta-RL
The Intra-Episodic Memory Hurdle
Hierarchical Transformers for Meta-RL
Experiments
Experimental Protocol
Meta-World
Performance metrics
Hypotheses and Results
Hypothesis 1: HTrMRL captures more general task features through its ability to capture intra- as well as inter-episode experiences and outperforms baselines that only use intra-episode experiences.
Hypothesis 3: HTrMRL outperforms the state-of-the-art online meta-RL in out-of-distribution (OOD) tasks.
Ablations
Conclusion
...and 8 more sections

Figures (25)

Figure 1: Illustration of the setting of the ML10 Benchmark in Meta-World. We evaluate HTrMRL in each task and collect these frames. On the left are the tasks HTrMRL is trained on, whereas on the right we show $3$ frames (beginning, middle and end of the episode) from evaluating on $2$ of the $5$ test tasks. Though these tasks are not present in the training set, HTrMRL manages to generalize and successfully close the door and sweep the block into the hole.
Figure 2: Illustration of our HTrMRL architecture. We store the $K$ past episodes (E) and sample a sequence of transition sequences of length $S$ from each episode, including a sequence of the current episode E$_{K}$, which ends with the current state. Transition sequences store which state $s_t$, action $a_t$, and reward $r_t$ were observed, but also a termination flag $d_t$ that indicates if an episode already terminated at time $t$. By inputting the sequences independently through a group of Transformer Encoder Blocks, we generate a feature vector for each sequence. We then put these sequence representations through a second group of Transformer Encoder Blocks to generate feature vectors that augment the state as input to policy $\pi$.
Figure 3: T-SNE plots of the output embeddings for our HTrMRL(left) and TrMRL(right) for five of the tasks in the training set of the ML10 benchmark of Meta-World. For better visibility, we only plot $5$ tasks here but provide a version with all $10$ in \ref{['fig:TSNE_plots_all']}.
Figure 4: Meta-Train and Test performance in terms of Average Success Rate of HTrMRL, TrMRL, PEARL, MAML TRPO, RL2 PPO and VariBAD on the ML1 benchmark for training(left) and testing(right) on parametric variations of the Push task.
Figure 5: Meta-Train and Test performance in terms of Average Success Rate of HTrMRL, TrMRL, PEARL, MAML TRPO, RL2 PPO and VariBAD on the ML10(left) and ML45(right) benchmarks. Here, we are training and testing on disjoint sets of tasks.
...and 20 more figures

Hierarchical Transformers are Efficient Meta-Reinforcement Learners

TL;DR

Abstract

Hierarchical Transformers are Efficient Meta-Reinforcement Learners

Authors

TL;DR

Abstract

Table of Contents

Figures (25)