Table of Contents
Fetching ...

Data Fusion-Enhanced Decision Transformer for Stable Cross-Domain Generalization

Guojian Wang, Quinson Hon, Xuyang Chen, Lin Zhao

TL;DR

DFDT addresses cross-domain generalization for Decision Transformer policies by explicitly restoring token-level stitchability across source and target dynamics. It fuses target data with selectively trusted source fragments using a two-level MMD+OT filtering framework, replaces brittle RTG with advantage-conditioned tokens, and employs a Q-guided regularizer to smooth trajectory junctions. Theoretical bounds connect value and policy gaps to stitchability radii and estimation errors, while experiments across gravity, kinematic, and morphology shifts show superior returns and improved sequence stability over strong offline RL and sequence baselines. By operating in trajectory space with data-filtered fusion and advantage-based conditioning, DFDT offers a principled approach to robust cross-domain transfer in offline reinforcement learning.

Abstract

Cross-domain shifts present a significant challenge for decision transformer (DT) policies. Existing cross-domain policy adaptation methods typically rely on a single simple filtering criterion to select source trajectory fragments and stitch them together. They match either state structure or action feasibility. However, the selected fragments still have poor stitchability: state structures can misalign, the return-to-go (RTG) becomes incomparable when the reward or horizon changes, and actions may jump at trajectory junctions. As a result, RTG tokens lose continuity, which compromises DT's inference ability. To tackle these challenges, we propose Data Fusion-Enhanced Decision Transformer (DFDT), a compact pipeline that restores stitchability. Particularly, DFDT fuses scarce target data with selectively trusted source fragments via a two-level data filter, maximum mean discrepancy (MMD) mismatch for state-structure alignment, and optimal transport (OT) deviation for action feasibility. It then trains on a feasibility-weighted fusion distribution. Furthermore, DFDT replaces RTG tokens with advantage-conditioned tokens, which improves the continuity of the semantics in the token sequence. It also applies a $Q$-guided regularizer to suppress junction value and action jumps. Theoretically, we provide bounds that tie state value and policy performance gaps to the MMD-mismatch and OT-deviation measures, and show that the bounds tighten as these two measures shrink. We show that DFDT improves return and stability over strong offline RL and sequence-model baselines across gravity, kinematic, and morphology shifts on D4RL-style control tasks, and further corroborate these gains with token-stitching and sequence-semantics stability analyses.

Data Fusion-Enhanced Decision Transformer for Stable Cross-Domain Generalization

TL;DR

DFDT addresses cross-domain generalization for Decision Transformer policies by explicitly restoring token-level stitchability across source and target dynamics. It fuses target data with selectively trusted source fragments using a two-level MMD+OT filtering framework, replaces brittle RTG with advantage-conditioned tokens, and employs a Q-guided regularizer to smooth trajectory junctions. Theoretical bounds connect value and policy gaps to stitchability radii and estimation errors, while experiments across gravity, kinematic, and morphology shifts show superior returns and improved sequence stability over strong offline RL and sequence baselines. By operating in trajectory space with data-filtered fusion and advantage-based conditioning, DFDT offers a principled approach to robust cross-domain transfer in offline reinforcement learning.

Abstract

Cross-domain shifts present a significant challenge for decision transformer (DT) policies. Existing cross-domain policy adaptation methods typically rely on a single simple filtering criterion to select source trajectory fragments and stitch them together. They match either state structure or action feasibility. However, the selected fragments still have poor stitchability: state structures can misalign, the return-to-go (RTG) becomes incomparable when the reward or horizon changes, and actions may jump at trajectory junctions. As a result, RTG tokens lose continuity, which compromises DT's inference ability. To tackle these challenges, we propose Data Fusion-Enhanced Decision Transformer (DFDT), a compact pipeline that restores stitchability. Particularly, DFDT fuses scarce target data with selectively trusted source fragments via a two-level data filter, maximum mean discrepancy (MMD) mismatch for state-structure alignment, and optimal transport (OT) deviation for action feasibility. It then trains on a feasibility-weighted fusion distribution. Furthermore, DFDT replaces RTG tokens with advantage-conditioned tokens, which improves the continuity of the semantics in the token sequence. It also applies a -guided regularizer to suppress junction value and action jumps. Theoretically, we provide bounds that tie state value and policy performance gaps to the MMD-mismatch and OT-deviation measures, and show that the bounds tighten as these two measures shrink. We show that DFDT improves return and stability over strong offline RL and sequence-model baselines across gravity, kinematic, and morphology shifts on D4RL-style control tasks, and further corroborate these gains with token-stitching and sequence-semantics stability analyses.

Paper Structure

This paper contains 53 sections, 11 theorems, 100 equations, 2 figures, 12 tables, 2 algorithms.

Key Result

Theorem 4.1

Under Assumptions ass:est–ass:concentrability, training with $\mathbb P_{\mathrm{mix}}^{\,w}$ yields estimators $V$ and $Q$. Let $V_T$ and $Q_T$ be the state and state--action value functions learned from the target dataset. Let $\pi_T^\ast$ and $\pi_{\rm mix}$ denote any optimal policies learned fr Moreover, by a performance-difference bound,

Figures (2)

  • Figure 1: An overview of our proposed framework. Credible source fragments are first selected by an MMD-based state-structure gate and an OT-based action-feasibility reweighting, then fused with scarce target data and fed, together with advantage-conditioned tokens $A$, into a Decision Transformer whose attention heads predict stable actions $\hat{a}_t$ under cross-domain shifts.
  • Figure 2: Mean action jump, $Q$-value jump, and TD error when evaluation.

Theorems & Definitions (26)

  • Definition 3.1: Two-level feasibility-weighted data fusion framework
  • Definition 4.1: Stitchability radii
  • Theorem 4.1: Performance bound under stitchability radii
  • Definition B.1: Pushforward measure
  • Definition B.2: Kantorovich--Rubinstein Duality
  • Lemma B.1: Expectation deviation under the weighted data fusion
  • proof
  • Definition C.1: Polish space
  • Definition C.2: Quotient map and induced map
  • Lemma C.1: Continuity of the quotient map and induced factor
  • ...and 16 more