Semi-Supervised Cross-Domain Imitation Learning
Li-Min Chu, Kai-Siang Ma, Ming-Hong Chen, Ping-Chun Hsieh
TL;DR
This work tackles cross-domain imitation learning under limited target-domain supervision by introducing Semi-Supervised CDIL and AdaptDICE, an offline framework that transfers knowledge from a source domain with imperfect demonstrations to a target domain with few expert trajectories. The method combines a cross-domain mapping loss to bridge domain gaps, a hybrid density-ratio for cross-domain policy extraction, and an adaptive weighting β(t) to balance source and target contributions, all trained offline without requiring paired demonstrations. The authors establish convergence guarantees for density-ratio estimation and demonstrate consistent gains over baselines on MuJoCo and Robosuite, achieving stable, data-efficient policy learning with minimal supervision. The approach offers practical benefits for real-world deployment where collecting target-domain expert data is costly or hazardous, enabling robust cross-domain imitation with limited labels and abundant imperfect data.
Abstract
Cross-domain imitation learning (CDIL) accelerates policy learning by transferring expert knowledge across domains, which is valuable in applications where the collection of expert data is costly. Existing methods are either supervised, relying on proxy tasks and explicit alignment, or unsupervised, aligning distributions without paired data, but often unstable. We introduce the Semi-Supervised CDIL (SS-CDIL) setting and propose the first algorithm for SS-CDIL with theoretical justification. Our method uses only offline data, including a small number of target expert demonstrations and some unlabeled imperfect trajectories. To handle domain discrepancy, we propose a novel cross-domain loss function for learning inter-domain state-action mappings and design an adaptive weight function to balance the source and target knowledge. Experiments on MuJoCo and Robosuite show consistent gains over the baselines, demonstrating that our approach achieves stable and data-efficient policy learning with minimal supervision. Our code is available at~ https://github.com/NYCU-RL-Bandits-Lab/CDIL.
