Denoising-based Contractive Imitation Learning
Macheng Shen, Jishen Peng, Zefang Huang
TL;DR
This work tackles covariate shift in imitation learning by promoting stability in state transitions using a denoising mechanism. It introduces DeCIL, a simple two-network approach consisting of a dynamics predictor $f$ and a denoising policy network $d$ that refines next-state predictions to yield a contraction in the state-transition mapping, supported by a Jacobian-based theoretical analysis. The denoising objective drives the Lipschitz constant of the denoising component to be less than one, reducing error propagation and drift, and experiments show improved robustness to noise and data-scarce settings across Intersection and MetaWorld tasks. The method is straightforward to integrate with existing imitation-learning pipelines and does not require additional expert data or complex training regimes, making it practically appealing for real-world applications.
Abstract
A fundamental challenge in imitation learning is the \emph{covariate shift} problem. Existing methods to mitigate covariate shift often require additional expert interactions, access to environment dynamics, or complex adversarial training, which may not be practical in real-world applications. In this paper, we propose a simple yet effective method (DeCIL) to mitigate covariate shift by incorporating a denoising mechanism that enhances the contraction properties of the state transition mapping. Our approach involves training two neural networks: a dynamics model ( f ) that predicts the next state from the current state, and a joint state-action denoising policy network ( d ) that refines this state prediction via denoising and outputs the corresponding action. We provide theoretical analysis showing that the denoising network acts as a local contraction mapping, reducing the error propagation of the state transition and improving stability. Our method is straightforward to implement and can be easily integrated with existing imitation learning frameworks without requiring additional expert data or complex modifications to the training procedure. Empirical results demonstrate that our approach effectively improves success rate of various imitation learning tasks under noise perturbation.
