Cross-Domain Imitation Learning via Optimal Transport
Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos
TL;DR
Cross-domain imitation learning is tackled by GWIL, which leverages the Gromov-Wasserstein distance $\mathcal{GW}$ to compare occupancy measures across incomparable state–action spaces. A proxy reward $r_{\mathcal{GW}}$ is constructed from the optimal coupling to train policies via RL, enabling imitation without proxy tasks. Theoretical results show that minimizing $\mathcal{GW}$ recovers an optimal policy up to an isometry under suitable metric and embedding conditions. Empirically, a single expert trajectory suffices to achieve near-optimal behavior across rigid, mildly transformed, and highly transformed domains, demonstrating scalable cross-domain transfer in continuous control. This framework expands the applicability of imitation learning by removing the need for paired demonstrations or proxy tasks, with potential impact on transferring skills between humans and robots with different morphologies.
Abstract
Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology. Comparing trajectories and stationary distributions between the expert and imitation agents is challenging because they live on different systems that may not even have the same dimensionality. We propose Gromov-Wasserstein Imitation Learning (GWIL), a method for cross-domain imitation that uses the Gromov-Wasserstein distance to align and compare states between the different spaces of the agents. Our theory formally characterizes the scenarios where GWIL preserves optimality, revealing its possibilities and limitations. We demonstrate the effectiveness of GWIL in non-trivial continuous control domains ranging from simple rigid transformation of the expert domain to arbitrary transformation of the state-action space.
