Adapting World Models with Latent-State Dynamics Residuals
JB Lanier, Kyungmin Kim, Armin Karamzade, Yifei Liu, Ankita Sinha, Kat He, Davide Corsi, Roy Fox
TL;DR
The paper tackles the sim-to-real gap in RL by learning a latent-space residual correction to a pretrained latent-state world model (DRAW), creating ReDRAW. It trains the latent encoder in simulation and learns a small residual δ_ derivative on real offline data to align latent-state dynamics with the real world, enabling policy optimization via imagined rollouts under corrected dynamics without requiring real rewards. Empirically, ReDRAW outperforms finetuning and zeroshot baselines in four DeepMind Control pairs and succeeds in sim-to-real Duckiebot lane-following with only about 17 minutes of real data, while maintaining robustness to overfitting in low-data regimes. The work demonstrates practical, data-efficient dynamics adaptation for vision-based robotic control and provides open-source tools for sim-to-real exploration. The approach bridges latent representation learning, residual dynamics, and model-based RL to enable reliable deployment in real environments even with limited real data.
Abstract
Simulation-to-reality reinforcement learning (RL) faces the critical challenge of reconciling discrepancies between simulated and real-world dynamics, which can severely degrade agent performance. A promising approach involves learning corrections to simulator forward dynamics represented as a residual error function, however this operation is impractical with high-dimensional states such as images. To overcome this, we propose ReDRAW, a latent-state autoregressive world model pretrained in simulation and calibrated to target environments through residual corrections of latent-state dynamics rather than of explicit observed states. Using this adapted world model, ReDRAW enables RL agents to be optimized with imagined rollouts under corrected dynamics and then deployed in the real world. In multiple vision-based MuJoCo domains and a physical robot visual lane-following task, ReDRAW effectively models changes to dynamics and avoids overfitting in low data regimes where traditional transfer methods fail.
