Sample-efficient Adversarial Imitation Learning
Dahuin Jung, Hyungyu Lee, Sungroh Yoon
TL;DR
This work tackles sample inefficiency in imitation learning by coupling self-supervised representation learning with adversarial imitation. It learns temporally predictive state/action representations and employs a novel swapping corruption to generate diverse, in-distribution distortions, all integrated into a GAIL-style objective. Theoretical motivation connects reduced VC dimension and MI-based contrastive learning to improved generalization, and extensive experiments on MuJoCo and Atari RAM show strong gains, including a $39 ext{\%}$ relative improvement with $N_E=100$ expert pairs. The approach scales to imperfect demonstrations and discrete control, though it increases model complexity and compute requirements. Overall, the method demonstrates substantial advances in sample-efficient imitation with robust representations and principled auxiliary tasks.
Abstract
Imitation learning, in which learning is performed by demonstration, has been studied and advanced for sequential decision-making tasks in which a reward function is not predefined. However, imitation learning methods still require numerous expert demonstration samples to successfully imitate an expert's behavior. To improve sample efficiency, we utilize self-supervised representation learning, which can generate vast training signals from the given data. In this study, we propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations that are robust to diverse distortions and temporally predictive, on non-image control tasks. In particular, in comparison with existing self-supervised learning methods for tabular data, we propose a different corruption method for state and action representations that is robust to diverse distortions. We theoretically and empirically observe that making an informative feature manifold with less sample complexity significantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors.
