Meta Reinforcement Learning with Latent Variable Gaussian Processes

Steindór Sæmundsson; Katja Hofmann; Marc Peter Deisenroth

Meta Reinforcement Learning with Latent Variable Gaussian Processes

Steindór Sæmundsson, Katja Hofmann, Marc Peter Deisenroth

TL;DR

Data-inefficiency in reinforcement learning is addressed by learning from related tasks. The authors propose a probabilistic meta-learning framework that treats task differences as latent variables and conditions a Gaussian-process dynamic model on these latents, with MPC-based planning. They develop a variational sparse GP to scale inference online and to support online updating of latent embeddings. Experiments on cart-pole swing-up and double-pendulum swing-up show improved predictive accuracy, interpretable latent embeddings, and substantial data-efficiency gains, including up to ~60% reduction in interaction time on unseen tasks.

Abstract

Learning from small data sets is critical in many practical applications where data collection is time consuming or expensive, e.g., robotics, animal experiments or drug design. Meta learning is one way to increase the data efficiency of learning algorithms by generalizing learned concepts from a set of training tasks to unseen, but related, tasks. Often, this relationship between tasks is hard coded or relies in some other way on human expertise. In this paper, we frame meta learning as a hierarchical latent variable model and infer the relationship between tasks automatically from data. We apply our framework in a model-based reinforcement learning setting and show that our meta-learning model effectively generalizes to novel tasks by identifying how new tasks relate to prior ones from minimal data. This results in up to a 60% reduction in the average interaction time needed to solve tasks compared to strong baselines.

Meta Reinforcement Learning with Latent Variable Gaussian Processes

TL;DR

Abstract

Meta Reinforcement Learning with Latent Variable Gaussian Processes

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)