Table of Contents
Fetching ...

Adapting World Models with Latent-State Dynamics Residuals

JB Lanier, Kyungmin Kim, Armin Karamzade, Yifei Liu, Ankita Sinha, Kat He, Davide Corsi, Roy Fox

TL;DR

The paper tackles the sim-to-real gap in RL by learning a latent-space residual correction to a pretrained latent-state world model (DRAW), creating ReDRAW. It trains the latent encoder in simulation and learns a small residual δ_ derivative on real offline data to align latent-state dynamics with the real world, enabling policy optimization via imagined rollouts under corrected dynamics without requiring real rewards. Empirically, ReDRAW outperforms finetuning and zeroshot baselines in four DeepMind Control pairs and succeeds in sim-to-real Duckiebot lane-following with only about 17 minutes of real data, while maintaining robustness to overfitting in low-data regimes. The work demonstrates practical, data-efficient dynamics adaptation for vision-based robotic control and provides open-source tools for sim-to-real exploration. The approach bridges latent representation learning, residual dynamics, and model-based RL to enable reliable deployment in real environments even with limited real data.

Abstract

Simulation-to-reality reinforcement learning (RL) faces the critical challenge of reconciling discrepancies between simulated and real-world dynamics, which can severely degrade agent performance. A promising approach involves learning corrections to simulator forward dynamics represented as a residual error function, however this operation is impractical with high-dimensional states such as images. To overcome this, we propose ReDRAW, a latent-state autoregressive world model pretrained in simulation and calibrated to target environments through residual corrections of latent-state dynamics rather than of explicit observed states. Using this adapted world model, ReDRAW enables RL agents to be optimized with imagined rollouts under corrected dynamics and then deployed in the real world. In multiple vision-based MuJoCo domains and a physical robot visual lane-following task, ReDRAW effectively models changes to dynamics and avoids overfitting in low data regimes where traditional transfer methods fail.

Adapting World Models with Latent-State Dynamics Residuals

TL;DR

The paper tackles the sim-to-real gap in RL by learning a latent-space residual correction to a pretrained latent-state world model (DRAW), creating ReDRAW. It trains the latent encoder in simulation and learns a small residual δ_ derivative on real offline data to align latent-state dynamics with the real world, enabling policy optimization via imagined rollouts under corrected dynamics without requiring real rewards. Empirically, ReDRAW outperforms finetuning and zeroshot baselines in four DeepMind Control pairs and succeeds in sim-to-real Duckiebot lane-following with only about 17 minutes of real data, while maintaining robustness to overfitting in low-data regimes. The work demonstrates practical, data-efficient dynamics adaptation for vision-based robotic control and provides open-source tools for sim-to-real exploration. The approach bridges latent representation learning, residual dynamics, and model-based RL to enable reliable deployment in real environments even with limited real data.

Abstract

Simulation-to-reality reinforcement learning (RL) faces the critical challenge of reconciling discrepancies between simulated and real-world dynamics, which can severely degrade agent performance. A promising approach involves learning corrections to simulator forward dynamics represented as a residual error function, however this operation is impractical with high-dimensional states such as images. To overcome this, we propose ReDRAW, a latent-state autoregressive world model pretrained in simulation and calibrated to target environments through residual corrections of latent-state dynamics rather than of explicit observed states. Using this adapted world model, ReDRAW enables RL agents to be optimized with imagined rollouts under corrected dynamics and then deployed in the real world. In multiple vision-based MuJoCo domains and a physical robot visual lane-following task, ReDRAW effectively models changes to dynamics and avoids overfitting in low data regimes where traditional transfer methods fail.

Paper Structure

This paper contains 34 sections, 7 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: (Left) The DRAW world model is trained to encode states into a latent representation, from which states, rewards, terminations, and future latent states are predicted. An RL agent is trained in the world model via synthetic rollouts. (Right) World model dynamics can be calibrated to a target environment by training a residual error correction on latent state dynamics predictions, allowing the RL agent to be trained under rectified dynamics.
  • Figure 2: Average evaluation episode return transferring from each DMC environment to a modified variant of it given $4e4$ offline target environment transition samples. Shaded regions indicate the standard error of the mean over 4 seeds for each method. ReDRAW consistently achieves high returns in the target environments and avoids overfitting.
  • Figure 3: Impact of offline adaptation dataset size and source/target domain data collection strategies on ReDRAW. Expert demonstrations consistently provide useful target domain data for adaptation. Collecting diverse simulation experience with a method like Plan2Explore is essential for good transfer performance.
  • Figure 4: (a) Digital-twin simulation constructed using Gaussian splatting gaussiansplatting. (b) Real-world robot lane-following environment. (c) Simulation state image component. (d) Real-world state image component. The agent is tasked to drive quickly around the track while staying near the lane center using an egocentric camera and velocity sensor. We train our DRAW world model in simulation and calibrate its dynamics with ReDRAW on offline real trajectories, producing a successful agent in the real environment.
  • Figure 5: (Left) The actor and critic are trained by interacting with the world model. Starting from an environment state sampled from the replay buffer, the world model generates imagined rollouts using actions provided by the actor. The residual component is omitted during DRAW pretraining. (Right) At deployment, only the encoder and actor modules are utilized. The immediate environment state is processed by the encoder, and the actor generates an action based on $z_t$ sampled from $\sigma_t$.
  • ...and 6 more figures