Table of Contents
Fetching ...

Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems

Mingwei Li, Xiaoyuan Zhang, Chengwei Yang, Zilong Zheng, Yaodong Yang

TL;DR

PRISM-WM tackles the failure modes of monolithic latent dynamics in hybrid systems by decomposing transitions into composable base dynamics via a context-aware Mixture-of-Experts with latent orthogonalization. The gating mechanism implicitly identifies physical regimes while specialized experts model regime-specific transitions, reducing long-horizon rollout drift and mitigating mode interference. The approach functions as a drop-in enhancement for both online planning (TD-MPC) and direct policy learning (PWM), improving planning fidelity and gradient stability. Experimental results across DiffRL, MT30, and Humanoid benchmarks demonstrate superior sample efficiency, better generalization, and robust long-horizon performance, highlighting PRISM-WM as a strong foundational component for next-generation model-based agents.

Abstract

Model-based planning in robotic domains is fundamentally challenged by the hybrid nature of physical dynamics, where continuous motion is punctuated by discrete events such as contacts and impacts. Conventional latent world models typically employ monolithic neural networks that enforce global continuity, inevitably over-smoothing the distinct dynamic modes (e.g., sticking vs. sliding, flight vs. stance). For a planner, this smoothing results in catastrophic compounding errors during long-horizon lookaheads, rendering the search process unreliable at physical boundaries. To address this, we introduce the Prismatic World Model (PRISM-WM), a structured architecture designed to decompose complex hybrid dynamics into composable primitives. PRISM-WM leverages a context-aware Mixture-of-Experts (MoE) framework where a gating mechanism implicitly identifies the current physical mode, and specialized experts predict the associated transition dynamics. We further introduce a latent orthogonalization objective to ensure expert diversity, effectively preventing mode collapse. By accurately modeling the sharp mode transitions in system dynamics, PRISM-WM significantly reduces rollout drift. Extensive experiments on challenging continuous control benchmarks, including high-dimensional humanoids and diverse multi-task settings, demonstrate that PRISM-WM provides a superior high-fidelity substrate for trajectory optimization algorithms (e.g., TD-MPC), proving its potential as a powerful foundational model for next-generation model-based agents.

Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems

TL;DR

PRISM-WM tackles the failure modes of monolithic latent dynamics in hybrid systems by decomposing transitions into composable base dynamics via a context-aware Mixture-of-Experts with latent orthogonalization. The gating mechanism implicitly identifies physical regimes while specialized experts model regime-specific transitions, reducing long-horizon rollout drift and mitigating mode interference. The approach functions as a drop-in enhancement for both online planning (TD-MPC) and direct policy learning (PWM), improving planning fidelity and gradient stability. Experimental results across DiffRL, MT30, and Humanoid benchmarks demonstrate superior sample efficiency, better generalization, and robust long-horizon performance, highlighting PRISM-WM as a strong foundational component for next-generation model-based agents.

Abstract

Model-based planning in robotic domains is fundamentally challenged by the hybrid nature of physical dynamics, where continuous motion is punctuated by discrete events such as contacts and impacts. Conventional latent world models typically employ monolithic neural networks that enforce global continuity, inevitably over-smoothing the distinct dynamic modes (e.g., sticking vs. sliding, flight vs. stance). For a planner, this smoothing results in catastrophic compounding errors during long-horizon lookaheads, rendering the search process unreliable at physical boundaries. To address this, we introduce the Prismatic World Model (PRISM-WM), a structured architecture designed to decompose complex hybrid dynamics into composable primitives. PRISM-WM leverages a context-aware Mixture-of-Experts (MoE) framework where a gating mechanism implicitly identifies the current physical mode, and specialized experts predict the associated transition dynamics. We further introduce a latent orthogonalization objective to ensure expert diversity, effectively preventing mode collapse. By accurately modeling the sharp mode transitions in system dynamics, PRISM-WM significantly reduces rollout drift. Extensive experiments on challenging continuous control benchmarks, including high-dimensional humanoids and diverse multi-task settings, demonstrate that PRISM-WM provides a superior high-fidelity substrate for trajectory optimization algorithms (e.g., TD-MPC), proving its potential as a powerful foundational model for next-generation model-based agents.

Paper Structure

This paper contains 41 sections, 8 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: The PRISM-WM architecture. To capture hybrid dynamics, the model structurally decomposes transitions: the Gating Network identifies the active latent regime, while Orthogonal Experts learn a diverse, non-redundant basis for the residual dynamics $\Delta Z$, preventing mode collapse during planning.
  • Figure 2: PRISM-WM planning lookahead. (Top) A selected stable trajectory (green) where the planner successfully maintains locomotion. (Bottom) A pruned branch where the world model accurately predicts a sharp failure discontinuity (loss of balance, red), enabling the planner to reject this unsafe action.
  • Figure 3: A Gallery of Diverse and Challenging Evaluation Environments. Our experiments are conducted across a wide range of continuous control benchmarks. These include locomotion tasks of varying difficulty such as standard walkers, quadrupeds in DiffRL and DMControl, complex whole-body humanoid control (Humanoid-Bench). This diversity validates the ability of our model to handle heterogeneous dynamics.
  • Figure 4: Benchmark performance comparison on high-dimensional locomotion tasks. The plots show the mean episode reward versus environment steps averaged over 5 random seeds (shaded regions represent one standard deviation). Our method (Prismatic model) consistently achieves higher sample efficiency and superior asymptotic performance compared to baselines, particularly on high-dimensional humanoid tasks.
  • Figure 5: Performance comparison on the MT30 multi-task benchmark. At the aggregate level, PRISM‑WM achieves a mean normalised score of 0.531, representing a 23.5% improvement over TD‑MPC2 (0.430).The plot provides a detailed breakdown of the normalized scores for each individual task.
  • ...and 5 more figures