Table of Contents
Fetching ...

Denoising-based Contractive Imitation Learning

Macheng Shen, Jishen Peng, Zefang Huang

TL;DR

This work tackles covariate shift in imitation learning by promoting stability in state transitions using a denoising mechanism. It introduces DeCIL, a simple two-network approach consisting of a dynamics predictor $f$ and a denoising policy network $d$ that refines next-state predictions to yield a contraction in the state-transition mapping, supported by a Jacobian-based theoretical analysis. The denoising objective drives the Lipschitz constant of the denoising component to be less than one, reducing error propagation and drift, and experiments show improved robustness to noise and data-scarce settings across Intersection and MetaWorld tasks. The method is straightforward to integrate with existing imitation-learning pipelines and does not require additional expert data or complex training regimes, making it practically appealing for real-world applications.

Abstract

A fundamental challenge in imitation learning is the \emph{covariate shift} problem. Existing methods to mitigate covariate shift often require additional expert interactions, access to environment dynamics, or complex adversarial training, which may not be practical in real-world applications. In this paper, we propose a simple yet effective method (DeCIL) to mitigate covariate shift by incorporating a denoising mechanism that enhances the contraction properties of the state transition mapping. Our approach involves training two neural networks: a dynamics model ( f ) that predicts the next state from the current state, and a joint state-action denoising policy network ( d ) that refines this state prediction via denoising and outputs the corresponding action. We provide theoretical analysis showing that the denoising network acts as a local contraction mapping, reducing the error propagation of the state transition and improving stability. Our method is straightforward to implement and can be easily integrated with existing imitation learning frameworks without requiring additional expert data or complex modifications to the training procedure. Empirical results demonstrate that our approach effectively improves success rate of various imitation learning tasks under noise perturbation.

Denoising-based Contractive Imitation Learning

TL;DR

This work tackles covariate shift in imitation learning by promoting stability in state transitions using a denoising mechanism. It introduces DeCIL, a simple two-network approach consisting of a dynamics predictor and a denoising policy network that refines next-state predictions to yield a contraction in the state-transition mapping, supported by a Jacobian-based theoretical analysis. The denoising objective drives the Lipschitz constant of the denoising component to be less than one, reducing error propagation and drift, and experiments show improved robustness to noise and data-scarce settings across Intersection and MetaWorld tasks. The method is straightforward to integrate with existing imitation-learning pipelines and does not require additional expert data or complex training regimes, making it practically appealing for real-world applications.

Abstract

A fundamental challenge in imitation learning is the \emph{covariate shift} problem. Existing methods to mitigate covariate shift often require additional expert interactions, access to environment dynamics, or complex adversarial training, which may not be practical in real-world applications. In this paper, we propose a simple yet effective method (DeCIL) to mitigate covariate shift by incorporating a denoising mechanism that enhances the contraction properties of the state transition mapping. Our approach involves training two neural networks: a dynamics model ( f ) that predicts the next state from the current state, and a joint state-action denoising policy network ( d ) that refines this state prediction via denoising and outputs the corresponding action. We provide theoretical analysis showing that the denoising network acts as a local contraction mapping, reducing the error propagation of the state transition and improving stability. Our method is straightforward to implement and can be easily integrated with existing imitation learning frameworks without requiring additional expert data or complex modifications to the training procedure. Empirical results demonstrate that our approach effectively improves success rate of various imitation learning tasks under noise perturbation.

Paper Structure

This paper contains 40 sections, 26 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Comparison of trajectory prediction methods. The black curve shows the expert manifold (ground truth), and the black dot indicates a noisy initial state. The blue trajectory and vector field show the prediction using only the learned drift network $f$, while the orange trajectory and vector field show the prediction using the combined drift and denoising networks ($f + d$). The denoising network helps pull the trajectory back to the expert manifold, effectively preventing covariate-shift.
  • Figure 2: Sensitivity Reduction Ratio vs. Noise Factor. The plot illustrates how the sensitivity reduction ratio $\rho$ changes with increasing Gaussian noise standard deviation $\sigma$. A ratio $\rho < 1$ indicates that the denoising mechanism effectively reduces sensitivity compared to behavior cloning (BC) alone. The ratio reaches a minimum at around $\sigma = 0.1$, demonstrating optimal noise resilience. Beyond this point, the ratio increases, suggesting that excessive noise forces the denoising network to rely more heavily on the current state $x_t$ to infer the next state, thereby diminishing the contraction effect. This behavior aligns with our residual interpretation, highlighting the efficacy of the denoising mechanism under moderate noise levels while indicating limitations when noise becomes too large.
  • Figure 3: Ablation study comparing DeCIL with a joint state-action prediction baseline. As noise increases, DeCIL retains high performance, while the baseline's performance degrades rapidly.

Theorems & Definitions (1)

  • Definition 4.1: Sensitivity Ratio