Table of Contents
Fetching ...

Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization

Frank Röder, Jan Benad, Manfred Eppe, Pradeep Kr. Banerjee

TL;DR

DALI integrates a dynamics-aligned context encoder into DreamerV3 to infer latent environmental context from short interaction histories, enabling zero-shot generalization across unseen cMDP contexts. The core ideas are forward dynamics alignment and cross-modal regularization to produce a robust context representation that conditions the world model and policy. Theoretical results show the encoder achieves near-optimal context information with short windows under $eta$-mixing and reduces information bottlenecks in the recurrent state, yielding a favorable sample complexity relative to full-episode context estimation. Empirically, DALI achieves significant extrapolation gains over context-unaware baselines and surpasses some ground-truth context baselines, while enabling physically consistent counterfactuals that align with Newtonian dynamics. Overall, DALI advances robust, sample-efficient zero-shot generalization in partially observable, context-shifted environments with minimal architectural overhead.

Abstract

Real-world reinforcement learning demands adaptation to unseen environmental conditions without costly retraining. Contextual Markov Decision Processes (cMDP) model this challenge, but existing methods often require explicit context variables (e.g., friction, gravity), limiting their use when contexts are latent or hard to measure. We introduce Dynamics-Aligned Latent Imagination (DALI), a framework integrated within the Dreamer architecture that infers latent context representations from agent-environment interactions. By training a self-supervised encoder to predict forward dynamics, DALI generates actionable representations conditioning the world model and policy, bridging perception and control. We theoretically prove this encoder is essential for efficient context inference and robust generalization. DALI's latent space enables counterfactual consistency: Perturbing a gravity-encoding dimension alters imagined rollouts in physically plausible ways. On challenging cMDP benchmarks, DALI achieves significant gains over context-unaware baselines, often surpassing context-aware baselines in extrapolation tasks, enabling zero-shot generalization to unseen contextual variations.

Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization

TL;DR

DALI integrates a dynamics-aligned context encoder into DreamerV3 to infer latent environmental context from short interaction histories, enabling zero-shot generalization across unseen cMDP contexts. The core ideas are forward dynamics alignment and cross-modal regularization to produce a robust context representation that conditions the world model and policy. Theoretical results show the encoder achieves near-optimal context information with short windows under -mixing and reduces information bottlenecks in the recurrent state, yielding a favorable sample complexity relative to full-episode context estimation. Empirically, DALI achieves significant extrapolation gains over context-unaware baselines and surpasses some ground-truth context baselines, while enabling physically consistent counterfactuals that align with Newtonian dynamics. Overall, DALI advances robust, sample-efficient zero-shot generalization in partially observable, context-shifted environments with minimal architectural overhead.

Abstract

Real-world reinforcement learning demands adaptation to unseen environmental conditions without costly retraining. Contextual Markov Decision Processes (cMDP) model this challenge, but existing methods often require explicit context variables (e.g., friction, gravity), limiting their use when contexts are latent or hard to measure. We introduce Dynamics-Aligned Latent Imagination (DALI), a framework integrated within the Dreamer architecture that infers latent context representations from agent-environment interactions. By training a self-supervised encoder to predict forward dynamics, DALI generates actionable representations conditioning the world model and policy, bridging perception and control. We theoretically prove this encoder is essential for efficient context inference and robust generalization. DALI's latent space enables counterfactual consistency: Perturbing a gravity-encoding dimension alters imagined rollouts in physically plausible ways. On challenging cMDP benchmarks, DALI achieves significant gains over context-unaware baselines, often surpassing context-aware baselines in extrapolation tasks, enabling zero-shot generalization to unseen contextual variations.

Paper Structure

This paper contains 30 sections, 7 theorems, 89 equations, 6 figures, 2 tables, 4 algorithms.

Key Result

Theorem 1

In a cMDP with $\beta$-mixing and Lipschitz dynamics, DALI's context encoder $\textcolor{red}{\mathfrak{z}}_t$ captures near-optimal context information, $\mathcal{I}(c; \textcolor{red}{\mathfrak{z}}_t) \geq (1 - \delta) h(c)$ for $\delta \in (0, 1)$, using $N = \mathcal{O}(1/\delta^2)$ windows of $

Figures (6)

  • Figure 1: Interquartile Mean (IQM) scores and Probability of Improvement (PoI) for the DMC Ball-in-Cup and Walker Walk tasks under contextual variations: gravity and string length for Ball-in-Cup, and gravity and actuator strength for Walker Walk. Results are shown for Featurized and Pixel observations in each environment. Scores aggregate across single and combined contexts. Shaded intervals represent 95% stratified bootstrap confidence intervals over seeds and aggregated contexts. The rightmost panel in each plot displays PoI in the Extrapolation regime for DALI-S-${\raisebox{\depth}{$\chi$}}$ (Ball-in-Cup) and DALI-S (Walker Walk), relative to baseline methods.
  • Figure 2: (a) (Pixel Modality) Counterfactual Trajectories in Pixel Space: Top: Original imagined trajectory of the Ball-in-Cup under default gravity and string length. Middle: Perturbed trajectory after adding noise $\Delta$ to the top-ranked latent dimension $\textcolor{red}{\mathfrak{z}}_6$. Bottom: Pixel-wise differences ($\delta = |\hat{o}_t - \hat{o}'_t|$). The perturbed trajectory (blue) exhibits a shorter string (frame 40) and faster acceleration (overtaking the original trajectory in frames 15 and 45), aligning with increased gravitational effects. Rollouts use zero actions to isolate passive dynamics. (b) (Featurized Modality) Ball Z-Position Under Latent Perturbation: Comparison of original (blue) and counterfactual (orange) ball height trajectories. The perturbed $\textcolor{red}{\mathfrak{z}}_6$ reduces oscillation amplitude (lower peak Z-position) and accelerates descent, consistent with shorter string length and higher gravity. (c) (Featurized Modality) Ball Z-Velocity Under Latent Perturbation: Velocity profiles for original (blue) and counterfactual (orange) trajectories. The perturbed $\textcolor{red}{\mathfrak{z}}_6$ induces earlier and higher velocity peaks, confirming faster swing dynamics. This mirrors the pixel-based evidence of increased gravitational acceleration.
  • Figure 3: DALI architecture overview.
  • Figure 4: Learning curves for DMC Ball-in-Cup and Walker Walk tasks under Featurized and Pixel-based observation modalities. Results show mean episode returns with $25$th–$75$th percentile confidence intervals.
  • Figure 5: AUC $\pm$$95\%$ CI per context dimension $\textcolor{red}{\mathfrak{z}}_j$ for the Ball-in-Cup task.
  • ...and 1 more figures

Theorems & Definitions (15)

  • Theorem 1
  • Theorem 2: Necessity and efficiency of Context encoder
  • Remark 3: Non-trivial sample complexity gain of DALI’s Context encoder
  • Remark 4: Consistency of DALI’s sample complexity across integration strategies
  • Theorem 5: Context encoder reduces information bottleneck
  • Lemma 6: Entropy decay from mixing
  • proof : Proof of Lemma \ref{['lemma:betamixing']}
  • Lemma 7: DALI’s generalization bound for context inference
  • proof : Proof of Lemma \ref{['lemma:DALIAerror']}
  • Lemma 8: Information bottleneck in DreamerV3’s RSSM
  • ...and 5 more