Table of Contents
Fetching ...

R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation

Naoki Morihira, Amal Nahar, Kartik Bharadwaj, Yasuhiro Kato, Akinobu Hayashi, Tatsuya Harada

Abstract

A central challenge in image-based Model-Based Reinforcement Learning (MBRL) is to learn representations that distill essential information from irrelevant visual details. While promising, reconstruction-based methods often waste capacity on large task-irrelevant regions. Decoder-free methods instead learn robust representations by leveraging Data Augmentation (DA), but reliance on such external regularizers limits versatility. We propose R2-Dreamer, a decoder-free MBRL framework with a self-supervised objective that serves as an internal regularizer, preventing representation collapse without resorting to DA. The core of our method is a redundancy-reduction objective inspired by Barlow Twins, which can be easily integrated into existing frameworks. On DeepMind Control Suite and Meta-World, R2-Dreamer is competitive with strong baselines such as DreamerV3 and TD-MPC2 while training 1.59x faster than DreamerV3, and yields substantial gains on DMC-Subtle with tiny task-relevant objects. These results suggest that an effective internal regularizer can enable versatile, high-performance decoder-free MBRL. Code is available at https://github.com/NM512/r2dreamer.

R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation

Abstract

A central challenge in image-based Model-Based Reinforcement Learning (MBRL) is to learn representations that distill essential information from irrelevant visual details. While promising, reconstruction-based methods often waste capacity on large task-irrelevant regions. Decoder-free methods instead learn robust representations by leveraging Data Augmentation (DA), but reliance on such external regularizers limits versatility. We propose R2-Dreamer, a decoder-free MBRL framework with a self-supervised objective that serves as an internal regularizer, preventing representation collapse without resorting to DA. The core of our method is a redundancy-reduction objective inspired by Barlow Twins, which can be easily integrated into existing frameworks. On DeepMind Control Suite and Meta-World, R2-Dreamer is competitive with strong baselines such as DreamerV3 and TD-MPC2 while training 1.59x faster than DreamerV3, and yields substantial gains on DMC-Subtle with tiny task-relevant objects. These results suggest that an effective internal regularizer can enable versatile, high-performance decoder-free MBRL. Code is available at https://github.com/NM512/r2dreamer.
Paper Structure (37 sections, 11 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 37 sections, 11 equations, 12 figures, 2 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of representation learning mechanisms in world models. (a) R2-Dreamer learns representations without a decoder or DA. It uses an internal redundancy reduction objective $\mathcal{L}_{\mathrm{BT}}$ that aligns the latent state $s_t$ (via a projector) with the embedding of the observation $o_t$. (b) Dreamer relies on a decoder to learn representations by reconstructing the observation $\hat{o}_t$ from the latent state $s_t$, guided by a reconstruction loss $\mathcal{L}_{\mathrm{recon}}$. (c) DreamerPro removes the decoder but depends on DA. It enforces consistency between augmented views of the observation $\mathrm{aug}(o_t)$ using a spatial loss $\mathcal{L}_{\mathrm{SwAV}}$ and a temporal loss $\mathcal{L}_{\mathrm{Temp}}$ that leverages an Exponential Moving Average (EMA) of the encoder weights.
  • Figure 2: Examples drawn from benchmark tasks. Left: Meta-World Assemble. Center: DMC Reacher (hard). Right: DMC-Subtle Reacher with a significantly smaller target.
  • Figure 3: Mean and median performance over 20 DMC tasks, with standard deviation across seeds. R2-Dreamer is competitive with the baselines on average without requiring a decoder or DA.
  • Figure 4: Aggregated performance on Meta-World 50 tasks using the mean and the median across tasks. R2-Dreamer achieves strong results even on contact-rich manipulation tasks, remaining competitive with the baselines on average.
  • Figure 5: Performance on five challenging DMC-Subtle tasks. R2-Dreamer substantially outperforms the baselines, demonstrating its robustness to subtle but critical visual information.
  • ...and 7 more figures