Table of Contents
Fetching ...

Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity

Emiliyan Gospodinov, Vaisakh Shaj, Philipp Becker, Stefan Geyer, Gerhard Neumann

TL;DR

A new formalism, Hidden Parameter-POMDP, designed for control with adaptive world models is introduced, which enables learning robust behaviors across a variety of non-stationary RL benchmarks and effectively learns task abstractions in an unsupervised manner.

Abstract

Developing foundational world models is a key research direction for embodied intelligence, with the ability to adapt to non-stationary environments being a crucial criterion. In this work, we introduce a new formalism, Hidden Parameter-POMDP, designed for control with adaptive world models. We demonstrate that this approach enables learning robust behaviors across a variety of non-stationary RL benchmarks. Additionally, this formalism effectively learns task abstractions in an unsupervised manner, resulting in structured, task-aware latent spaces.

Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity

TL;DR

A new formalism, Hidden Parameter-POMDP, designed for control with adaptive world models is introduced, which enables learning robust behaviors across a variety of non-stationary RL benchmarks and effectively learns task abstractions in an unsupervised manner.

Abstract

Developing foundational world models is a key research direction for embodied intelligence, with the ability to adapt to non-stationary environments being a crucial criterion. In this work, we introduce a new formalism, Hidden Parameter-POMDP, designed for control with adaptive world models. We demonstrate that this approach enables learning robust behaviors across a variety of non-stationary RL benchmarks. Additionally, this formalism effectively learns task abstractions in an unsupervised manner, resulting in structured, task-aware latent spaces.

Paper Structure

This paper contains 42 sections, 5 equations, 14 figures, 1 table, 1 algorithm.

Figures (14)

  • Figure 1: Given a set of N transitions, the deep set encoder emits a latent representation for each of the observations and their corresponding uncertainty. The set of latent representations is then aggregated via Bayesian aggregation to infer $p \left( \bm{l} \mid \bm{C_l} \right)$.
  • Figure 2: Hidden Parameter RSSM: The latent task variable is inferred from context $\bm{C}_{l}$ via Bayesian aggregation. Solid lines indicate the generative process and dashed lines the inference model. Modifications from hafner2019dream are shown in red.
  • Figure 3: Performance of HalfCheetah and Hopper agents under changing dynamics caused by joint perturbations and body mass inertia variations, respectively.
  • Figure 4: Performance comparison of Half Cheetah and Walker agents under different changing reward scenarios (changing target velocities and skills).
  • Figure 5: 2d projections of learned latent state spaces on DMC Cheetah learning 4 skills, Table \ref{['table:dmc_multi_task_benchmarks']}.
  • ...and 9 more figures

Theorems & Definitions (1)

  • Definition 2.1