Simplifying Latent Dynamics with Softly State-Invariant World Models

Tankred Saanum; Peter Dayan; Eric Schulz

Simplifying Latent Dynamics with Softly State-Invariant World Models

Tankred Saanum, Peter Dayan, Eric Schulz

TL;DR

This work introduces the Parsimonious Latent Space Model (PLSM), a world model that regularizes the latent dynamics to make the effect of the agent's actions more predictable and makes the world model softly state-invariant.

Abstract

To solve control problems via model-based reasoning or planning, an agent needs to know how its actions affect the state of the world. The actions an agent has at its disposal often change the state of the environment in systematic ways. However, existing techniques for world modelling do not guarantee that the effect of actions are represented in such systematic ways. We introduce the Parsimonious Latent Space Model (PLSM), a world model that regularizes the latent dynamics to make the effect of the agent's actions more predictable. Our approach minimizes the mutual information between latent states and the change that an action produces in the agent's latent state, in turn minimizing the dependence the state has on the dynamics. This makes the world model softly state-invariant. We combine PLSM with different model classes used for i) future latent state prediction, ii) planning, and iii) model-free reinforcement learning. We find that our regularization improves accuracy, generalization, and performance in downstream tasks, highlighting the importance of systematic treatment of actions in world models.

Simplifying Latent Dynamics with Softly State-Invariant World Models

TL;DR

Abstract

Paper Structure (27 sections, 13 equations, 15 figures, 3 tables)

This paper contains 27 sections, 13 equations, 15 figures, 3 tables.

Introduction
Latent dynamics
Parsimonious latent dynamics
Parsimonious dynamics for Reinforcement Learning
Model-based RL
Distracting visual control
Model-free RL
Future state prediction
PLSM improves long-horizon prediction accuracy
Generalization and robustness
Related work
Conclusion
Mutual Information minimization
PLSM vs L1 and L2 norm regularization
Ablations
...and 12 more sections

Figures (15)

Figure 1: Overview: World models are commonly used to predict latent trajectories, predict sequences of pixel observations, and perform planning. We propose an architecture together with an information bottleneck for learning simple and parsimonious world models. Our method relies on a query network that extracts a sparse representation $\mathbf{h}_t$ for predicting latent transition dynamics. Combining our method with auxiliary loss functions for i) contrastive learning ii) planning and iii) and model-free RL, we see consistent performance improvement in all domains. Lines and bars show mean performance from three sets of RL benchmarks. Error bars represent 95% confidence interval.
Figure 2: The heart (left) can appear on any $x, y$ coordinate in a two-dimensional latent space with boundaries, on which it can transition in 9 different ways (moving in eight directions and standing still, for instance when moving into a boundary). Encouraging dynamics to be parsimonious recovers these 9 different possible transitions (see right), whereas an unconstrained model (see center) does not.
Figure 3: PLSM, when incorporated into either the TD-MPC algorithm (A), or RePo (B), improves planning in continuous control tasks with high-dimensional and complex dynamics, and visual distractions, respectively. Lines show the average return attained across 15 evaluation episodes, averaged over five seeds. The shaded region represents the 95% confidence interval.
Figure 4: Changing the dynamics model in SPR to PLSM increases score in several Atari games, with little implementation overhead. On average, human normalized scores are higher when using PLSM dynamics. Bars show difference in human normalized score between SPR with and without PLSM dynamics, averaged over five seeds.
Figure 5: PLSM improves contrastive world models' accuracy in long-horizon latent prediction in five out of six environments. In the cubes and shapes dataset, the PLSM is close to perfect even when predicting as far as 10 timesteps in the future. Lines show accuracy on entire test data averaged over five random seeds. The shaded region corresponds to the standard error of the mean.
...and 10 more figures

Simplifying Latent Dynamics with Softly State-Invariant World Models

TL;DR

Abstract

Simplifying Latent Dynamics with Softly State-Invariant World Models

Authors

TL;DR

Abstract

Table of Contents

Figures (15)