MuDreamer: Learning Predictive World Models without Reconstruction
Maxime Burchi, Radu Timofte
TL;DR
MuDreamer addresses DreamerV3's dependence on pixel reconstruction by learning a task-focused latent world model that predicts rewards, continuation, value, and actions. Inspired by MuZero, it uses an RSSM-based architecture with an action-prediction head and batch normalization to avoid representation collapse, achieving robust performance under visual distractions and competitive Atari100k results with faster training. The approach highlights the importance of KL balancing and demonstrates strong empirical gains on the DeepMind Visual Control Suite, including natural-background settings. By eliminating the reconstruction loss and decoder requirements, MuDreamer reduces architectural overhead while maintaining or improving control performance in continuous and discrete action spaces.
Abstract
The DreamerV3 agent recently demonstrated state-of-the-art performance in diverse domains, learning powerful world models in latent space using a pixel reconstruction loss. However, while the reconstruction loss is essential to Dreamer's performance, it also necessitates modeling unnecessary information. Consequently, Dreamer sometimes fails to perceive crucial elements which are necessary for task-solving when visual distractions are present in the observation, significantly limiting its potential. In this paper, we present MuDreamer, a robust reinforcement learning agent that builds upon the DreamerV3 algorithm by learning a predictive world model without the need for reconstructing input signals. Rather than relying on pixel reconstruction, hidden representations are instead learned by predicting the environment value function and previously selected actions. Similar to predictive self-supervised methods for images, we find that the use of batch normalization is crucial to prevent learning collapse. We also study the effect of KL balancing between model posterior and prior losses on convergence speed and learning stability. We evaluate MuDreamer on the commonly used DeepMind Visual Control Suite and demonstrate stronger robustness to visual distractions compared to DreamerV3 and other reconstruction-free approaches, replacing the environment background with task-irrelevant real-world videos. Our method also achieves comparable performance on the Atari100k benchmark while benefiting from faster training.
