Table of Contents
Fetching ...

DADP: Domain Adaptive Diffusion Policy

Pengcheng Wang, Qinghang Liu, Haotian Lin, Yiheng Li, Guojian Zhan, Masayoshi Tomizuka, Yixiao Wang

TL;DR

This work tackles the difficulty of generalizing learned policies to unseen transition dynamics by proposing DADP, a diffusion-policy framework that (1) unsupervisedly disentangles static domain information from time-varying dynamics using Lagged Context Dynamical Prediction with a large temporal offset, and (2) leverages the learned domain representation by biasing the diffusion prior and shaping the diffusion target through a joint noise-representation term. This combination yields robust zero-shot domain adaptation across challenging locomotion and manipulation benchmarks, with extensive ablations showing the gains come from both disentanglement (larger $\Delta t$) and diffusion-modulation strategies (Mixed DDIM and predictive targets). The method achieves state-of-the-art performance and strong generalization to out-of-distribution domains, while providing open-source datasets and code for the community. Overall, DADP advances domain-adaptive control by integrating unsupervised static-domain discovery with diffusion-based action generation, enabling more reliable deployment in varied and unseen environments.

Abstract

Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture domain-specific information, thus enabling domain-aware decision making. We analyze the process of learning domain representations through dynamical prediction and find that selecting contexts adjacent to the current step causes the learned representations to entangle static domain information with varying dynamical properties. Such mixture can confuse the conditioned policy, thereby constraining zero-shot adaptation. To tackle the challenge, we propose DADP (Domain Adaptive Diffusion Policy), which achieves robust adaptation through unsupervised disentanglement and domain-aware diffusion injection. First, we introduce Lagged Context Dynamical Prediction, a strategy that conditions future state estimation on a historical offset context; by increasing this temporal gap, we unsupervisedly disentangle static domain representations by filtering out transient properties. Second, we integrate the learned domain representations directly into the generative process by biasing the prior distribution and reformulating the diffusion target. Extensive experiments on challenging benchmarks across locomotion and manipulation demonstrate the superior performance, and the generalizability of DADP over prior methods. More visualization results are available on the https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/.

DADP: Domain Adaptive Diffusion Policy

TL;DR

This work tackles the difficulty of generalizing learned policies to unseen transition dynamics by proposing DADP, a diffusion-policy framework that (1) unsupervisedly disentangles static domain information from time-varying dynamics using Lagged Context Dynamical Prediction with a large temporal offset, and (2) leverages the learned domain representation by biasing the diffusion prior and shaping the diffusion target through a joint noise-representation term. This combination yields robust zero-shot domain adaptation across challenging locomotion and manipulation benchmarks, with extensive ablations showing the gains come from both disentanglement (larger ) and diffusion-modulation strategies (Mixed DDIM and predictive targets). The method achieves state-of-the-art performance and strong generalization to out-of-distribution domains, while providing open-source datasets and code for the community. Overall, DADP advances domain-adaptive control by integrating unsupervised static-domain discovery with diffusion-based action generation, enabling more reliable deployment in varied and unseen environments.

Abstract

Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture domain-specific information, thus enabling domain-aware decision making. We analyze the process of learning domain representations through dynamical prediction and find that selecting contexts adjacent to the current step causes the learned representations to entangle static domain information with varying dynamical properties. Such mixture can confuse the conditioned policy, thereby constraining zero-shot adaptation. To tackle the challenge, we propose DADP (Domain Adaptive Diffusion Policy), which achieves robust adaptation through unsupervised disentanglement and domain-aware diffusion injection. First, we introduce Lagged Context Dynamical Prediction, a strategy that conditions future state estimation on a historical offset context; by increasing this temporal gap, we unsupervisedly disentangle static domain representations by filtering out transient properties. Second, we integrate the learned domain representations directly into the generative process by biasing the prior distribution and reformulating the diffusion target. Extensive experiments on challenging benchmarks across locomotion and manipulation demonstrate the superior performance, and the generalizability of DADP over prior methods. More visualization results are available on the https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/.
Paper Structure (30 sections, 18 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 18 equations, 12 figures, 8 tables, 1 algorithm.

Figures (12)

  • Figure 1: Averaged Normalized Performance of baselines across In-Distribution and Out-of-Distribution settings across all tasks. The results are normalized with random and expert policy performance.
  • Figure 2: t-SNE Visualization of Denoising Process of Standard Diffusion and DADP. The sampled points from prior distribution and utilized representation are contructed from the training datasets and learned context encoder.
  • Figure 3: t-SNE visualization of walker representations learned with different $\Delta t$.
  • Figure 4: Walker Online Adaptation Representation Visualizations
  • Figure 5: Intuition of the $\Delta t$ desing: since the varying velocity inferred from another episode in the same domain can not assist the prediction, only static gravity will be extracted in the representation.
  • ...and 7 more figures