Table of Contents
Fetching ...

Dreaming the Unseen: World Model-regularized Diffusion Policy for Out-of-Distribution Robustness

Ziou Hu, Xiangtong Yao, Yuan Meng, Zhenshan Bing, Alois Knoll

Abstract

Diffusion policies excel at visuomotor control but often fail catastrophically under severe out-of-distribution (OOD) disturbances, such as unexpected object displacements or visual corruptions. To address this vulnerability, we introduce the Dream Diffusion Policy (DDP), a framework that deeply integrates a diffusion world model into the policy's training objective via a shared 3D visual encoder. This co-optimization endows the policy with robust state-prediction capabilities. When encountering sudden OOD anomalies during inference, DDP detects the real-imagination discrepancy and actively abandons the corrupted visual stream. Instead, it relies on its internal "imagination" (autoregressively forecasted latent dynamics) to safely bypass the disruption, generating imagined trajectories before smoothly realigning with physical reality. Extensive evaluations demonstrate DDP's exceptional resilience. Notably, DDP achieves a 73.8% OOD success rate on MetaWorld (vs. 23.9% without predictive imagination) and an 83.3% success rate under severe real-world spatial shifts (vs. 3.3% without predictive imagination). Furthermore, as a stress test, DDP maintains a 76.7% real-world success rate even when relying entirely on open-loop imagination post-initialization.

Dreaming the Unseen: World Model-regularized Diffusion Policy for Out-of-Distribution Robustness

Abstract

Diffusion policies excel at visuomotor control but often fail catastrophically under severe out-of-distribution (OOD) disturbances, such as unexpected object displacements or visual corruptions. To address this vulnerability, we introduce the Dream Diffusion Policy (DDP), a framework that deeply integrates a diffusion world model into the policy's training objective via a shared 3D visual encoder. This co-optimization endows the policy with robust state-prediction capabilities. When encountering sudden OOD anomalies during inference, DDP detects the real-imagination discrepancy and actively abandons the corrupted visual stream. Instead, it relies on its internal "imagination" (autoregressively forecasted latent dynamics) to safely bypass the disruption, generating imagined trajectories before smoothly realigning with physical reality. Extensive evaluations demonstrate DDP's exceptional resilience. Notably, DDP achieves a 73.8% OOD success rate on MetaWorld (vs. 23.9% without predictive imagination) and an 83.3% success rate under severe real-world spatial shifts (vs. 3.3% without predictive imagination). Furthermore, as a stress test, DDP maintains a 76.7% real-world success rate even when relying entirely on open-loop imagination post-initialization.
Paper Structure (19 sections, 6 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 19 sections, 6 equations, 10 figures, 3 tables, 2 algorithms.

Figures (10)

  • Figure 1: Overview of the Dream Diffusion Policy (DDP) framework.(a) A shared 3D encoder extracts unified latent embeddings from point clouds and proprioception. These history latents jointly condition the Diffusion Policy to denoise action chunks, and the Diffusion World Model (via channel-wise concatenation) to forecast future latent states. (b) During inference stage, upon detecting an Out-of-Distribution anomaly, the system halts, tracks the object, and transitions into an imagination loop. The World Model autoregressively forecasts latent states to drive the policy's action generation, safely bypassing corrupted observations until the recovery is triggered.
  • Figure 2: Overview of the DDP workflow. Illustrating the closed-loop transitions between physical reality (ID) and internal imagination (OOD) across subtasks.
  • Figure 3: Out-of-Distribution (OOD) spatial disturbances in MetaWorld.Yellow dashed boxes highlight target objects, and solid arrows indicate the direction of dynamic displacements injected mid-task. The exact linear translation vectors $(x, y, z)$ applied to shift each object out-of-distribution are detailed below.
  • Figure 4: DDP execution on MetaWorld Assembly and Push. Sequences show the transition from the ID Base Policy (green) to OOD tracking (yellow), bypassing visual shifts via the imagined Dream Policy (pink), and finally recovering to the target.
  • Figure 5: Robot Setup and Objects Tracking. (a) is all the hardware for the real-world experiment. (b) shows the bounding box for object tracking across all tasks.
  • ...and 5 more figures