DIDA: Denoised Imitation Learning based on Domain Adaptation
Kaichen Huang, Hai-Hang Sun, Shenghua Wan, Minghao Shao, Shuai Feng, Le Gan, De-Chuan Zhan
TL;DR
This work tackles Learning from Noisy Demonstrations by introducing DIDA, a domain-adaptation-based imitation learning framework that learns task-relevant yet domain-robust representations from fully noisy data. It employs two discriminators (noise and policy) and a feature encoder, guided by a gradient-reversal objective and a mutual-information constraint, along with two practical components: Domain Adversarial Sampling (DAS) and Self-Adaptive Rate (SAR). A shuffle-based anchor buffer bridges noisy and imitator domains, enabling effective domain adaptation without requiring random data collection in the expert domain. Empirical results on MuJoCo tasks (Hopper, Swimmer) across multiple noise types show that DIDA outperforms several baselines, demonstrating robust imitation under realistic noisy data conditions and highlighting the value of domain adaptation in LND contexts.
Abstract
Imitating skills from low-quality datasets, such as sub-optimal demonstrations and observations with distractors, is common in real-world applications. In this work, we focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise that often occurs during the processes of data collection or transmission. Previous IL methods improve the robustness of learned policies by injecting an adversarially learned Gaussian noise into pure expert data or utilizing additional ranking information, but they may fail in the LND setting. To alleviate the above problems, we propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data, facilitating a feature encoder to learn task-related but domain-agnostic representations. Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.
