World Models as Reference Trajectories for Rapid Motor Adaptation
Carlos Stein Brito, Daniel McNamee
TL;DR
The paper tackles the problem of sustaining performance when real-world dynamics change by introducing Reflexive World Models (RWM), a dual-control framework that uses world-model predictions as implicit reference trajectories for rapid adaptation. A base RL policy operates in a learned latent space to maximize long-term reward, while a lightweight adaptive controller uses forward-model predictions to track those references, providing fast error correction with low online cost. The authors derive control-theoretic guarantees linking world-model accuracy, control authority, and external perturbations to bounded error and value performance, and demonstrate robust adaptation across high-dimensional locomotion tasks under actuator perturbations. Empirical results show that RWM achieves faster adaptation and higher robustness than model-based RL baselines and domain-randomized pre-training, illustrating a principled bridge between adaptive control and modern RL for reliable real-world deployment.
Abstract
Deploying learned control policies in real-world environments poses a fundamental challenge. When system dynamics change unexpectedly, performance degrades until models are retrained on new data. We introduce Reflexive World Models (RWM), a dual control framework that uses world model predictions as implicit reference trajectories for rapid adaptation. Our method separates the control problem into long-term reward maximization through reinforcement learning and robust motor execution through rapid latent control. This dual architecture achieves significantly faster adaptation with low online computational cost compared to model-based RL baselines, while maintaining near-optimal performance. The approach combines the benefits of flexible policy learning through reinforcement learning with rapid error correction capabilities, providing a principled approach to maintaining performance in high-dimensional continuous control tasks under varying dynamics.
