Next Embedding Prediction Makes World Models Stronger

George Bredis; Nikita Balagansky; Daniil Gavrilov; Ruslan Rakhimov

Next Embedding Prediction Makes World Models Stronger

George Bredis, Nikita Balagansky, Daniil Gavrilov, Ruslan Rakhimov

TL;DR

NE-Dreamer is introduced, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space.

Abstract

Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially observable environments.

Next Embedding Prediction Makes World Models Stronger

TL;DR

Abstract

Paper Structure (28 sections, 16 equations, 7 figures, 1 table)

This paper contains 28 sections, 16 equations, 7 figures, 1 table.

Introduction
Related Work
World models for pixel control.
Reconstruction-based world models.
Decoder-free world models.
Representation prediction and collapse prevention.
Method
Problem setup
Latent world model (RSSM)
Encoder and latent inference.
Reward and continuation heads.
World-model objective.
Next-embedding predictive alignment
Causal next-embedding predictor.
Alignment loss (Barlow Twins).
...and 13 more sections

Figures (7)

Figure 1: DMLab Benchmark Summary. Under matched compute and model capacity (50M environment steps; 5 seeds; 12M parameters), NE-Dreamer outperforms strong decoder-based (DreamerV3) and decoder-free world-model baselines (R2-Dreamer, DreamerPro) on the DMLab Rooms memory/navigation tasks.
Figure 2: Method overview. NE-Dreamer keeps Dreamer’s RSSM dynamics and imagination-based actor--critic, but replaces same-step pixel reconstruction with next-embedding prediction using a causal temporal transformer, improving long-horizon performance under partial observability.
Figure 3: DMLab Rooms: improved long-horizon memory/navigation. Under matched compute and model capacity ($50$M environment steps; 5 seeds; 12M parameters), NE-Dreamer outperforms strong decoder-based (DreamerV3) and decoder-free world-model baselines (R2-Dreamer, DreamerPro) on four Rooms tasks. The largest gains occur when success depends on maintaining state over long horizons rather than reacting to short-lived visual cues.
Figure 4: Mechanism on DMLab Rooms: predictive sequence modeling is the key. Under matched compute and model capacity ($50$M environment steps; $5$ seeds; mean$\pm$std), removing the causal temporal transformer (w/o transformer) or removing the next-step target shift (w/o shift) substantially reduces performance. Removing the lightweight projector (w/o projector) mainly affects optimization speed/stability, with smaller impact on final returns.
Figure 5: Post-hoc decoder reconstruction reveals temporal consistency. Rows show ground-truth observations (GT) and reconstructions from a post-hoc decoder trained on frozen latents. NE-Dreamer preserves task-relevant objects and spatial layout consistently over time (marked green circles), while same-timestep methods (Dreamer, R2-Dreamer) exhibit temporal inconsistency, where task-specific attributes appear transiently and then fade (marked red circles).
...and 2 more figures

Next Embedding Prediction Makes World Models Stronger

TL;DR

Abstract

Next Embedding Prediction Makes World Models Stronger

Authors

TL;DR

Abstract

Table of Contents

Figures (7)