Table of Contents
Fetching ...

xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing

Haoyi Niu, Qimao Chen, Tenglong Liu, Jianxiong Li, Guyue Zhou, Yi Zhang, Jianming Hu, Xianyuan Zhan

TL;DR

The paper tackles data scarcity and domain gaps in reinforcement and imitation learning by proposing xTED, a diffusion-based trajectory editing framework that operates at the data level rather than the policy level. A target-domain diffusion prior is learned, and source-domain trajectories are edited by adding noise and denoising with this prior, controlled by a ratio $ kappa = k/K$, typically around 0.5, while preserving primitive task information. The method uses separate encoders/decoders for states, actions, and rewards and dependency-aware attention to capture temporal and causal dynamics, enabling effective cross-domain adaptation when integrating with downstream IL/RL methods. Empirical results in simulation and real-robot experiments show that edited source data consistently improves policy learning over targets-only or unedited-source baselines, and that xTED can also serve as a powerful single-domain data augmenter. Overall, xTED offers a flexible, task- and domain-agnostic data-level solution that can synergize with existing cross-domain methods and downstream policies for robust transfer in robotics and decision-making tasks.

Abstract

Reusing pre-collected data from different domains is an appealing solution for decision-making tasks, especially when data in the target domain are limited. Existing cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning, such as learning task/domain-specific discriminators, representations, or policies. This design philosophy often results in heavy model architectures or task/domain-specific modeling, lacking flexibility. This reality makes us wonder: can we directly bridge the domain gaps universally at the data level, instead of relying on complex downstream cross-domain policy transfer procedures? In this study, we propose the Cross-Domain Trajectory EDiting (xTED) framework that employs a specially designed diffusion model for cross-domain trajectory adaptation. Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data. Edited by adding noises and denoising with the pre-trained diffusion model, source domain trajectories can be transformed to align with target domain properties while preserving original semantic information. This process effectively corrects underlying domain gaps, enhancing state realism and dynamics reliability in source data, and allowing flexible integration with various single-domain and cross-domain downstream policy learning methods. Despite its simplicity, xTED demonstrates superior performance in extensive simulation and real-robot experiments.

xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing

TL;DR

The paper tackles data scarcity and domain gaps in reinforcement and imitation learning by proposing xTED, a diffusion-based trajectory editing framework that operates at the data level rather than the policy level. A target-domain diffusion prior is learned, and source-domain trajectories are edited by adding noise and denoising with this prior, controlled by a ratio , typically around 0.5, while preserving primitive task information. The method uses separate encoders/decoders for states, actions, and rewards and dependency-aware attention to capture temporal and causal dynamics, enabling effective cross-domain adaptation when integrating with downstream IL/RL methods. Empirical results in simulation and real-robot experiments show that edited source data consistently improves policy learning over targets-only or unedited-source baselines, and that xTED can also serve as a powerful single-domain data augmenter. Overall, xTED offers a flexible, task- and domain-agnostic data-level solution that can synergize with existing cross-domain methods and downstream policies for robust transfer in robotics and decision-making tasks.

Abstract

Reusing pre-collected data from different domains is an appealing solution for decision-making tasks, especially when data in the target domain are limited. Existing cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning, such as learning task/domain-specific discriminators, representations, or policies. This design philosophy often results in heavy model architectures or task/domain-specific modeling, lacking flexibility. This reality makes us wonder: can we directly bridge the domain gaps universally at the data level, instead of relying on complex downstream cross-domain policy transfer procedures? In this study, we propose the Cross-Domain Trajectory EDiting (xTED) framework that employs a specially designed diffusion model for cross-domain trajectory adaptation. Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data. Edited by adding noises and denoising with the pre-trained diffusion model, source domain trajectories can be transformed to align with target domain properties while preserving original semantic information. This process effectively corrects underlying domain gaps, enhancing state realism and dynamics reliability in source data, and allowing flexible integration with various single-domain and cross-domain downstream policy learning methods. Despite its simplicity, xTED demonstrates superior performance in extensive simulation and real-robot experiments.
Paper Structure (38 sections, 11 equations, 5 figures, 14 tables)

This paper contains 38 sections, 11 equations, 5 figures, 14 tables.

Figures (5)

  • Figure 1: While sharing conceptual similarities with image editing, trajectory editing introduces distinct challenges due to the inherent complexity of sequential decision-making data, such as heterogeneous elements and complex internal dependencies .
  • Figure 2: The model architecture is designed for capturing heterogeneous physical meanings of decision-making elements (states, actions, and rewards) and their intricate temporal and internal dependencies.
  • Figure 3: Target and source domains with complicated discrepancies on embodiments and viewpoints (top) and experiment results (bottom). The top right presents the snapshots from base and wrist camera views of data collection processes in target/source domain from Cup/Duck/Pot tasks respectively. The average success rate for real-robot tasks with/without distractors is obtained over 3 seeds.
  • Figure 4: Average normalized scores for single-domain data augmentation. Averaged over 5 random seeds.
  • Figure 5: Average normalized returns for different degrees of dynamic gaps (different thigh sizes) on 20k transitions from WK-MR and HC-MR.