Imitation Learning from Purified Demonstrations

Yunke Wang; Minjing Dong; Yukun Zhao; Bo Du; Chang Xu

Imitation Learning from Purified Demonstrations

Yunke Wang, Minjing Dong, Yukun Zhao, Bo Du, Chang Xu

TL;DR

The paper tackles imitation learning with imperfect demonstrations by introducing DP-IL, a diffusion-based purification framework that first diffuses suboptimal data to remove perturbation patterns and then uses a learned reverse diffusion to recover purified demonstrations. A diffusion model is trained on a small set of optimal demonstrations and applied to purify the larger set of suboptimal ones, enabling the agent to learn from a closer approximation to the expert distribution via occupancy-measure matching. The authors provide theoretical bounds on the distance between purified and optimal demonstrations and demonstrate that DP-IL improves performance in both offline (BC) and online (GAIL) settings on MuJoCo and RoboSuite, across various noise types and demonstration qualities. The method is modular and can be integrated into existing IL frameworks, offering a practical path to robust policy learning when optimal data is scarce.

Abstract

Imitation learning has emerged as a promising approach for addressing sequential decision-making problems, with the assumption that expert demonstrations are optimal. However, in real-world scenarios, most demonstrations are often imperfect, leading to challenges in the effectiveness of imitation learning. While existing research has focused on optimizing with imperfect demonstrations, the training typically requires a certain proportion of optimal demonstrations to guarantee performance. To tackle these problems, we propose to purify the potential noises in imperfect demonstrations first, and subsequently conduct imitation learning from these purified demonstrations. Motivated by the success of diffusion model, we introduce a two-step purification via diffusion process. In the first step, we apply a forward diffusion process to smooth potential noises in imperfect demonstrations by introducing additional noise. Subsequently, a reverse generative process is utilized to recover the optimal demonstration from the diffused ones. We provide theoretical evidence supporting our approach, demonstrating that the distance between the purified and optimal demonstration can be bounded. Empirical results on MuJoCo and RoboSuite demonstrate the effectiveness of our method from different aspects.

Imitation Learning from Purified Demonstrations

TL;DR

Abstract

Paper Structure (33 sections, 9 theorems, 37 equations, 4 figures, 9 tables, 2 algorithms)

This paper contains 33 sections, 9 theorems, 37 equations, 4 figures, 9 tables, 2 algorithms.

Introduction
Related Work
Imitation Learning from Imperfect Demonstrations
Diffusion Model in Imitation Learning
Preliminary
Markov Decision Process (MDP)
Imitation Learning via Distribution Matching
Methodology
General Objective
Purification via Diffusion Process
Training Diffusion Model with Optimal Demos.
Purifying Sub-optimal Demonstrations.
Choice of optimal $i_r$
Theoretical Analysis
Experiments
...and 18 more sections

Key Result

Theorem 1

Let $\{x_t\}_{t\in \{0,1\}}$ be samples in the forward diffusion process. If we denote $\rho_{\pi_o,t}(x)$ and $\rho_{\pi_s,t}(x)$ as the respective distributions of $x_t$ when $x_{o,0}\sim \rho_{\pi_o,t=0}(x)$ and $x_{s,0}\sim \rho_{\pi_s,t=0}(x)$, we then have, where $\varsigma=\frac{\partial D_{KL}(\rho_{\pi_o,t}(x)||\rho_{\pi_s,t}(x))}{\partial t}$ denotes the derivative of $t$ to the KL dive

Figures (4)

Figure 1: The training curve of DP-GAIL and other online imitation learning methods with D1-L1 demonstrations. The x-axis is the number of interactions with the environment and the shaded area indicates the standard error.
Figure 2: Impact of diffusion time $t_r$ with demonstrations of different optimality.
Figure 3: Visualization of SaywerNutAssembly task in RoboSuite platform and the quality of human demonstrations.
Figure 4: The corresponding curve of the elasticity $E_{\sigma, C_{\sigma,d}}$ with respect to $\sigma$.

Theorems & Definitions (15)

Definition 1
Theorem 1
Theorem 2
Proposition 3
Theorem 4
Theorem 1
proof
Theorem 2
proof
Proposition 3
...and 5 more

Imitation Learning from Purified Demonstrations

TL;DR

Abstract

Imitation Learning from Purified Demonstrations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)