Offline Imitation Learning with Variational Counterfactual Reasoning
Bowei He, Zexu Sun, Jinxin Liu, Shuai Zhang, Xu Chen, Chen Ma
TL;DR
The paper tackles offline imitation learning when expert data is scarce and unlabeled data is noisy, proposing Offline Imitation Learning with Counterfactual Data Augmentation (OILCA). OILCA uses an identifiable variational counterfactual reasoning framework to generate high-quality counterfactual expert data via a conditional VAE and SCM-based counterfactuals, improving generalization without online interaction. The authors provide identifiability and generalization analyses and demonstrate strong improvements on in-distribution benchmarks (DeepMind Control Suite) and out-of-distribution scenarios (CausalWorld). The approach is validated with extensive experiments and accompanied by a public code release, highlighting its practical impact for robust offline IL in diverse environments.
Abstract
In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.
