Table of Contents
Fetching ...

Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction

Michael Hauri, Friedemann Zenke

TL;DR

This work matches Dreamer's performance on Crafter, demonstrating effective world model learning on this benchmark without reconstruction objectives, and introduces a JEPA-style predictor defined on continuous, deterministic representations.

Abstract

Model-based reinforcement learning (MBRL) agents operating in high-dimensional observation spaces, such as Dreamer, rely on learning abstract representations for effective planning and control. Existing approaches typically employ reconstruction-based objectives in the observation space, which can render representations sensitive to task-irrelevant details. Recent alternatives trade reconstruction for auxiliary action prediction heads or view augmentation strategies, but perform worse in the Crafter environment than reconstruction-based methods. We close this gap between Dreamer and reconstruction-free models by introducing a JEPA-style predictor defined on continuous, deterministic representations. Our method matches Dreamer's performance on Crafter, demonstrating effective world model learning on this benchmark without reconstruction objectives.

Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction

TL;DR

This work matches Dreamer's performance on Crafter, demonstrating effective world model learning on this benchmark without reconstruction objectives, and introduces a JEPA-style predictor defined on continuous, deterministic representations.

Abstract

Model-based reinforcement learning (MBRL) agents operating in high-dimensional observation spaces, such as Dreamer, rely on learning abstract representations for effective planning and control. Existing approaches typically employ reconstruction-based objectives in the observation space, which can render representations sensitive to task-irrelevant details. Recent alternatives trade reconstruction for auxiliary action prediction heads or view augmentation strategies, but perform worse in the Crafter environment than reconstruction-based methods. We close this gap between Dreamer and reconstruction-free models by introducing a JEPA-style predictor defined on continuous, deterministic representations. Our method matches Dreamer's performance on Crafter, demonstrating effective world model learning on this benchmark without reconstruction objectives.
Paper Structure (10 sections, 4 equations, 3 figures, 2 tables)

This paper contains 10 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: a) Schematic of Dreamer-CDP. The hidden state is passed through a predictor (green) trained to approximate the next continuous representation $\hat{u}_{t+1} \approx u_{t+1}$. In Dreamer, the hidden state and the input embedding are used to predict the next input $x_{t+1}$ (dashed gray). b) Graphical model of Dreamer (left) and Dreamer-CDP (right) with losses in red. c) Visual examples when $\mathcal{L}_\mathrm{recon}$ (Dreamer), $\mathcal{L}_\mathrm{CDP}$ (Dreamer-CDP) or neither is applied. For the latter two, the decoder was trained independently with detached gradients for visualization purposes.
  • Figure A.1: Crafter score for Dreamer (Blue) and Dreamer-CDP (purple) for each achievement.
  • Figure A.2: Comparison between Dreamer-CDP (Purple) and several ablations. Orange: Ablation of the reward and value gradient to train the world model. Most of the learning signal comes from the latent space predictive loss. Green: Ablation of $\mathcal{L}_\mathrm{CDP}$ and $\mathcal{L}_\mathrm{dyn/rep}$. The KL balancing is not sufficient to train the world model in the latent space. Black: CDP loss alone also results in lower cumulative return.