Table of Contents
Fetching ...

Cross-Domain Policy Transfer by Representation Alignment via Multi-Domain Behavioral Cloning

Hayato Watahiki, Ryo Iwase, Ryosuke Unno, Yoshimasa Tsuruoka

TL;DR

The paper tackles cross-domain policy transfer without requiring direct target-domain interaction by learning a domain-shared latent space and a single abstract policy through multi-domain behavioral cloning. It regularizes latent-state alignment with maximum mean discrepancy and optionally temporal cycle-consistency, while keeping the encoder/decoder fixed during adaptation to new tasks. Empirical results across cross-morphology and cross-viewpoint settings show that the proposed PLP method often outperforms domain-translation and adversarial baselines, with BC playing a key role in alignment and MMD preserving latent structure. The approach provides a simple, offline-compatible pathway to zero-shot transfer and highlights avenues for extending portability via larger foundation policies and state-only demonstrations.

Abstract

Transferring learned skills across diverse situations remains a fundamental challenge for autonomous agents, particularly when agents are not allowed to interact with an exact target setup. While prior approaches have predominantly focused on learning domain translation, they often struggle with handling significant domain gaps or out-of-distribution tasks. In this paper, we present a simple approach for cross-domain policy transfer that learns a shared latent representation across domains and a common abstract policy on top of it. Our approach leverages multi-domain behavioral cloning on unaligned trajectories of proxy tasks and employs maximum mean discrepancy (MMD) as a regularization term to encourage cross-domain alignment. The MMD regularization better preserves structures of latent state distributions than commonly used domain-discriminative distribution matching, leading to higher transfer performance. Moreover, our approach involves training only one multi-domain policy, which makes extension easier than existing methods. Empirical evaluations demonstrate the efficacy of our method across various domain shifts, especially in scenarios where exact domain translation is challenging, such as cross-morphology or cross-viewpoint settings. Our ablation studies further reveal that multi-domain behavioral cloning implicitly contributes to representation alignment alongside domain-adversarial regularization.

Cross-Domain Policy Transfer by Representation Alignment via Multi-Domain Behavioral Cloning

TL;DR

The paper tackles cross-domain policy transfer without requiring direct target-domain interaction by learning a domain-shared latent space and a single abstract policy through multi-domain behavioral cloning. It regularizes latent-state alignment with maximum mean discrepancy and optionally temporal cycle-consistency, while keeping the encoder/decoder fixed during adaptation to new tasks. Empirical results across cross-morphology and cross-viewpoint settings show that the proposed PLP method often outperforms domain-translation and adversarial baselines, with BC playing a key role in alignment and MMD preserving latent structure. The approach provides a simple, offline-compatible pathway to zero-shot transfer and highlights avenues for extending portability via larger foundation policies and state-only demonstrations.

Abstract

Transferring learned skills across diverse situations remains a fundamental challenge for autonomous agents, particularly when agents are not allowed to interact with an exact target setup. While prior approaches have predominantly focused on learning domain translation, they often struggle with handling significant domain gaps or out-of-distribution tasks. In this paper, we present a simple approach for cross-domain policy transfer that learns a shared latent representation across domains and a common abstract policy on top of it. Our approach leverages multi-domain behavioral cloning on unaligned trajectories of proxy tasks and employs maximum mean discrepancy (MMD) as a regularization term to encourage cross-domain alignment. The MMD regularization better preserves structures of latent state distributions than commonly used domain-discriminative distribution matching, leading to higher transfer performance. Moreover, our approach involves training only one multi-domain policy, which makes extension easier than existing methods. Empirical evaluations demonstrate the efficacy of our method across various domain shifts, especially in scenarios where exact domain translation is challenging, such as cross-morphology or cross-viewpoint settings. Our ablation studies further reveal that multi-domain behavioral cloning implicitly contributes to representation alignment alongside domain-adversarial regularization.
Paper Structure (37 sections, 6 equations, 40 figures, 6 tables, 2 algorithms)

This paper contains 37 sections, 6 equations, 40 figures, 6 tables, 2 algorithms.

Figures (40)

  • Figure 1: Illustration of a shared representation space and a common policy. We can transfer knowledge across domains if semantically similar states are mapped to close points in the shared latent space. The state projection onto the common latent space, the prediction of latent actions from the common state space (yellow arrow), and the decoding of latent actions to each domain are modeled as a state encoder $\phi$, a common policy $\pi_z$, and an action decoder $\psi$ in our proposed method, respectively (c.f. Figure \ref{['fig:overview']}).
  • Figure 2: Alignment phase. All modules are trainable.
  • Figure 3: Adaptation phase. Only common policy is updated.
  • Figure 5: P2P, BC Only
  • Figure 6: P2P, BC + MMD
  • ...and 35 more figures