Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization
Frank Röder, Jan Benad, Manfred Eppe, Pradeep Kr. Banerjee
TL;DR
DALI integrates a dynamics-aligned context encoder into DreamerV3 to infer latent environmental context from short interaction histories, enabling zero-shot generalization across unseen cMDP contexts. The core ideas are forward dynamics alignment and cross-modal regularization to produce a robust context representation that conditions the world model and policy. Theoretical results show the encoder achieves near-optimal context information with short windows under $eta$-mixing and reduces information bottlenecks in the recurrent state, yielding a favorable sample complexity relative to full-episode context estimation. Empirically, DALI achieves significant extrapolation gains over context-unaware baselines and surpasses some ground-truth context baselines, while enabling physically consistent counterfactuals that align with Newtonian dynamics. Overall, DALI advances robust, sample-efficient zero-shot generalization in partially observable, context-shifted environments with minimal architectural overhead.
Abstract
Real-world reinforcement learning demands adaptation to unseen environmental conditions without costly retraining. Contextual Markov Decision Processes (cMDP) model this challenge, but existing methods often require explicit context variables (e.g., friction, gravity), limiting their use when contexts are latent or hard to measure. We introduce Dynamics-Aligned Latent Imagination (DALI), a framework integrated within the Dreamer architecture that infers latent context representations from agent-environment interactions. By training a self-supervised encoder to predict forward dynamics, DALI generates actionable representations conditioning the world model and policy, bridging perception and control. We theoretically prove this encoder is essential for efficient context inference and robust generalization. DALI's latent space enables counterfactual consistency: Perturbing a gravity-encoding dimension alters imagined rollouts in physically plausible ways. On challenging cMDP benchmarks, DALI achieves significant gains over context-unaware baselines, often surpassing context-aware baselines in extrapolation tasks, enabling zero-shot generalization to unseen contextual variations.
