Table of Contents
Fetching ...

DIDA: Denoised Imitation Learning based on Domain Adaptation

Kaichen Huang, Hai-Hang Sun, Shenghua Wan, Minghao Shao, Shuai Feng, Le Gan, De-Chuan Zhan

TL;DR

This work tackles Learning from Noisy Demonstrations by introducing DIDA, a domain-adaptation-based imitation learning framework that learns task-relevant yet domain-robust representations from fully noisy data. It employs two discriminators (noise and policy) and a feature encoder, guided by a gradient-reversal objective and a mutual-information constraint, along with two practical components: Domain Adversarial Sampling (DAS) and Self-Adaptive Rate (SAR). A shuffle-based anchor buffer bridges noisy and imitator domains, enabling effective domain adaptation without requiring random data collection in the expert domain. Empirical results on MuJoCo tasks (Hopper, Swimmer) across multiple noise types show that DIDA outperforms several baselines, demonstrating robust imitation under realistic noisy data conditions and highlighting the value of domain adaptation in LND contexts.

Abstract

Imitating skills from low-quality datasets, such as sub-optimal demonstrations and observations with distractors, is common in real-world applications. In this work, we focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise that often occurs during the processes of data collection or transmission. Previous IL methods improve the robustness of learned policies by injecting an adversarially learned Gaussian noise into pure expert data or utilizing additional ranking information, but they may fail in the LND setting. To alleviate the above problems, we propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data, facilitating a feature encoder to learn task-related but domain-agnostic representations. Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.

DIDA: Denoised Imitation Learning based on Domain Adaptation

TL;DR

This work tackles Learning from Noisy Demonstrations by introducing DIDA, a domain-adaptation-based imitation learning framework that learns task-relevant yet domain-robust representations from fully noisy data. It employs two discriminators (noise and policy) and a feature encoder, guided by a gradient-reversal objective and a mutual-information constraint, along with two practical components: Domain Adversarial Sampling (DAS) and Self-Adaptive Rate (SAR). A shuffle-based anchor buffer bridges noisy and imitator domains, enabling effective domain adaptation without requiring random data collection in the expert domain. Empirical results on MuJoCo tasks (Hopper, Swimmer) across multiple noise types show that DIDA outperforms several baselines, demonstrating robust imitation under realistic noisy data conditions and highlighting the value of domain adaptation in LND contexts.

Abstract

Imitating skills from low-quality datasets, such as sub-optimal demonstrations and observations with distractors, is common in real-world applications. In this work, we focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise that often occurs during the processes of data collection or transmission. Previous IL methods improve the robustness of learned policies by injecting an adversarially learned Gaussian noise into pure expert data or utilizing additional ranking information, but they may fail in the LND setting. To alleviate the above problems, we propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data, facilitating a feature encoder to learn task-related but domain-agnostic representations. Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.
Paper Structure (27 sections, 1 theorem, 13 equations, 15 figures, 2 tables, 1 algorithm)

This paper contains 27 sections, 1 theorem, 13 equations, 15 figures, 2 tables, 1 algorithm.

Key Result

Proposition 4.1

(Limitation of GAIL) Given expert data with LTI noise and no perfect information, GAIL can only solve small-scale Gaussian noise and cannot handle other LTI noise, proof in proof.

Figures (15)

  • Figure 1: The simplified t-SNE van2008visualizing plot of states in collected trajectories. We selected 10 data points of each class from the original plot (\ref{['f-full-tsne']}) and displayed them using different colors and shapes. The arrows point to the rendered images using noisy underlying states, demonstrating the significant impact of noise on observations.
  • Figure 2: We consider the expertise level and the noise level of data as the x-axis and the y-axis, respectively. $\tilde{\mathcal{B}}_E$ and $\mathcal{B}_A$ are located at the diagonal position.
  • Figure 3: The main framework of the DIDA method. We design the feature encoder $G_f$ to map state $s$ into embedding $z$. We sample batches of size $N$ in the imitator buffer $\mathcal{B}_I$, the noisy anchor buffer $\tilde{\mathcal{B}}_A$, and the noisy expert buffer $\tilde{\mathcal{B}}_E$ (defined in \ref{['s-setting']}) and feeding them into $G_f$ to get embeddings $Z_I$, $\tilde{Z}_A$, and $\tilde{Z}_E$. Noise discriminator $D_n$ judges the noise level of all embeddings and generates binary classification results. $R$ denotes the gradient reversal layer. We apply the technique of domain adversarial sampling (DAS, defined in \ref{['s-DAS']}) to compute the confusion probability distribution $P_{das}(z^I)$ over $Z_I$ based on the classification error probability of the embeddings. We take $\alpha N$ embeddings from $Z_I$ based on $P_{das}$ and replace the $\alpha N$ embeddings in $\tilde{Z}_A$ with them randomly, where $\alpha$ is the adaptive rate from \ref{['s-adaptive-rate']}. Policy discriminator $D_p$ judges the expertise level of $Z_{\text{mix}}$ and $\tilde{Z}_E$.
  • Figure 4: Top: The confusion probability $P_{das}$ of $Z_A$ at iter-1, iter-1000, and iter-5000 are shown from left to right, sorting the samples with their degrees of confusion decreasing from left to right. Bottom: The t-SNE plots of embeddings from $Z_A$, $\tilde{Z}_R$, and $\tilde{Z}_E$ at iter-1000. From left to right corresponds to embeddings with the highest, middle, and lowest degrees of confusion at iter-1000, respectively. More detailed t-SNE plots in \ref{['appendix-confusion-tsne']}.
  • Figure 5: Curve of change in $p_{\text{acc}}$. Top: $\mathcal{B}_I$, $\tilde{\mathcal{B}}_A$ and $\tilde{\mathcal{B}}_E$ only occupy a tiny portion of the $n$-dimensional state space $\mathcal{X}$. When $D_n$ is randomly initialized, the classification hyperplane has a high probability of classifying all datasets into the same category, resulting in an accuracy of $\frac{2}{3}$ (e.g. $D_{n_0}$) or $\frac{1}{3}$ (e.g. $D_{n_1}$).Bottom Left: The initial value is $p_2$, and it fluctuates between $p_1$ and $p_2$ in the late stage of training. Bottom Right: The initial value is $p_1$ and it also fluctuates between $p_1$ and $p_2$.
  • ...and 10 more figures

Theorems & Definitions (2)

  • Proposition 4.1
  • proof : Proof of Proposition 4.1