Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning

Hanping Zhang; Yuhong Guo

Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning

Hanping Zhang, Yuhong Guo

TL;DR

Bridging Dynamics Gaps for Cross-Domain Reinforcement Learning (BDGxRL) is proposed, a novel framework that leverages Diffusion Schrodinger Bridge to align source transitions with target-domain dynamics encoded in offline demonstrations and introduces a reward modulation mechanism that estimates rewards based on state transitions.

Abstract

Cross-domain reinforcement learning (RL) aims to learn transferable policies under dynamics shifts between source and target domains. A key challenge lies in the lack of target-domain environment interaction and reward supervision, which prevents direct policy learning. To address this challenge, we propose Bridging Dynamics Gaps for Cross-Domain Reinforcement Learning (BDGxRL), a novel framework that leverages Diffusion Schrödinger Bridge (DSB) to align source transitions with target-domain dynamics encoded in offline demonstrations. Moreover, we introduce a reward modulation mechanism that estimates rewards based on state transitions, applying to DSB-aligned samples to ensure consistency between rewards and target-domain dynamics. BDGxRL performs target-oriented policy learning entirely within the source domain, without access to the target environment or its rewards. Experiments on MuJoCo cross-domain benchmarks demonstrate that BDGxRL outperforms state-of-the-art baselines and shows strong adaptability under transition dynamics shifts.

Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning

TL;DR

Abstract

Paper Structure (31 sections, 3 theorems, 28 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 3 theorems, 28 equations, 2 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Cross-Domain Reinforcement Learning
Diffusion Schrödinger Bridge
Preliminaries
Diffusion Schrödinger Bridge
Iterative Markov Fitting
Method
Problem Setting
DSB-based Dynamics Alignment
Transition Representation
DSB Training
Forward Process
Backward Process
Dynamics Alignment
...and 16 more sections

Key Result

Theorem 1

Assume the reward is bounded by $R_{\max}$, and the discount factor satisfies $\gamma \in [0,1)$. Let $\pi$ be the policy learned using BDGxRL with DSB-based dynamics translation and reward modulation, and let $\pi^\star$ denote the optimal policy in the target environment. Then, when the number of where $\epsilon_m = \mathcal{O}\left( \frac{1}{N_S} + \frac{1}{N_T} \right)$ denotes the dynamics a

Figures (2)

Figure 1: Overview of the proposed BDGxRL framework. The agent first collects a dataset $\mathcal{D}_S$ from the source environment, which is used to train a transition-aware reward model $R(s_t,s_{t+1})$. Together with offline target demonstrations $\mathcal{D}_T$, it also trains a DSB model for dynamics alignment. During online interactions, source transitions are translated into target-style transitions via $\tilde{s}_{t+1}\sim\mathrm{DSB}(s_t,a_t,s_{t+1})$ to mitigate dynamics mismatch. The modulated reward $\tilde{r}_t=R(s_t,\tilde{s}_{t+1})$ is then used to learn a target-oriented policy entirely within the source domain, initialized via imitation from $\mathcal{D}_T$.
Figure 2: Overall average performance of each method across all tasks, domain gaps, and demonstration levels.

Theorems & Definitions (5)

Theorem 1: Policy Value Bound under DSB Translation
Lemma 1: Transition Model Error Bound
proof
Theorem 2: Policy Value Bound under DSB Translation
proof

Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning

TL;DR

Abstract

Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (5)