Table of Contents
Fetching ...

Meta Reinforcement Learning with Latent Variable Gaussian Processes

Steindór Sæmundsson, Katja Hofmann, Marc Peter Deisenroth

TL;DR

Data-inefficiency in reinforcement learning is addressed by learning from related tasks. The authors propose a probabilistic meta-learning framework that treats task differences as latent variables and conditions a Gaussian-process dynamic model on these latents, with MPC-based planning. They develop a variational sparse GP to scale inference online and to support online updating of latent embeddings. Experiments on cart-pole swing-up and double-pendulum swing-up show improved predictive accuracy, interpretable latent embeddings, and substantial data-efficiency gains, including up to ~60% reduction in interaction time on unseen tasks.

Abstract

Learning from small data sets is critical in many practical applications where data collection is time consuming or expensive, e.g., robotics, animal experiments or drug design. Meta learning is one way to increase the data efficiency of learning algorithms by generalizing learned concepts from a set of training tasks to unseen, but related, tasks. Often, this relationship between tasks is hard coded or relies in some other way on human expertise. In this paper, we frame meta learning as a hierarchical latent variable model and infer the relationship between tasks automatically from data. We apply our framework in a model-based reinforcement learning setting and show that our meta-learning model effectively generalizes to novel tasks by identifying how new tasks relate to prior ones from minimal data. This results in up to a 60% reduction in the average interaction time needed to solve tasks compared to strong baselines.

Meta Reinforcement Learning with Latent Variable Gaussian Processes

TL;DR

Data-inefficiency in reinforcement learning is addressed by learning from related tasks. The authors propose a probabilistic meta-learning framework that treats task differences as latent variables and conditions a Gaussian-process dynamic model on these latents, with MPC-based planning. They develop a variational sparse GP to scale inference online and to support online updating of latent embeddings. Experiments on cart-pole swing-up and double-pendulum swing-up show improved predictive accuracy, interpretable latent embeddings, and substantial data-efficiency gains, including up to ~60% reduction in interaction time on unseen tasks.

Abstract

Learning from small data sets is critical in many practical applications where data collection is time consuming or expensive, e.g., robotics, animal experiments or drug design. Meta learning is one way to increase the data efficiency of learning algorithms by generalizing learned concepts from a set of training tasks to unseen, but related, tasks. Often, this relationship between tasks is hard coded or relies in some other way on human expertise. In this paper, we frame meta learning as a hierarchical latent variable model and infer the relationship between tasks automatically from data. We apply our framework in a model-based reinforcement learning setting and show that our meta-learning model effectively generalizes to novel tasks by identifying how new tasks relate to prior ones from minimal data. This results in up to a 60% reduction in the average interaction time needed to solve tasks compared to strong baselines.

Paper Structure

This paper contains 24 sections, 13 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Graphical model for our $\textnormal{ML-GP}$ model.
  • Figure 2: The figure shows six unknown tasks (toy examples) with a shared structure (the same function) and task specific variation (fixed offset). The $\textnormal{ML-GP}$ model is able to disentangle the two automatically given the training data (black discs) as demonstrated by the training prediction curves. It also infers a reasonable value for the offset given a single observations from unseen test tasks (orange discs) and can use the global structure to generalize predictive performance on those tasks.
  • Figure 3: Mean and two standard deviation confidence error-bars of the RMSE and NLL for the $\textnormal{ML-GP}$, $\textnormal{SGP}$ and the standard GP model as a function of the number of inducing points. The $\textnormal{ML-GP}$ significantly outperforms both baselines.
  • Figure 4: One-step predictions of the angular velocity in cart-pole. The figure shows the true data points (discs) and the predictive distributions with a two standard deviation confidence interval for the $\textnormal{ML-GP}$, $\textnormal{SGP}$ and a standard GP. The $\textnormal{ML-GP}$ generalizes well to new tasks; both the $\textnormal{SGP}$ and GP baselines are overly confident.
  • Figure 5: Latent space embedding of cart-pole configurations/ tasks. The figure shows the mean (discs) of the inferred latent variables and two standard deviation error bars. Filled discs are training tasks and empty discs are held out test tasks. The colors of the discs represent the length and the colors of the dotted lines between discs represent the mass.
  • ...and 2 more figures