TRAIL: Near-Optimal Imitation Learning with Suboptimal Data
Mengjiao Yang, Sergey Levine, Ofir Nachum
TL;DR
TRAIL addresses the challenge of leveraging abundant suboptimal offline data to improve imitation learning when near-optimal expert data are scarce. It learns a factored transition model and a low-dimensional latent action space from offline data, then performs imitation learning in the latent space with a reparameterization φ, enabling sample-efficient BC. The authors derive a bound on imitation error decomposed into transition representation error, decoding error, and latent BC error, and show improved sample complexity under certain conditions; they propose TRAIL with EBM or linear transitions. Empirical results on AntMaze, locomotion, and DeepMind Control Suite demonstrate substantial improvements over vanilla BC and robustness to highly suboptimal offline data, often rivaling offline RL without rewards. This suggests action-representation learning from offline dynamics is a productive alternative for offline sequential decision making.
Abstract
The aim in imitation learning is to learn effective policies by utilizing near-optimal expert demonstrations. However, high-quality demonstrations from human experts can be expensive to obtain in large numbers. On the other hand, it is often much easier to obtain large quantities of suboptimal or task-agnostic trajectories, which are not useful for direct imitation, but can nevertheless provide insight into the dynamical structure of the environment, showing what could be done in the environment even if not what should be done. We ask the question, is it possible to utilize such suboptimal offline datasets to facilitate provably improved downstream imitation learning? In this work, we answer this question affirmatively and present training objectives that use offline datasets to learn a factored transition model whose structure enables the extraction of a latent action space. Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning, effectively reducing the need for large near-optimal expert datasets through the use of auxiliary non-expert data. To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model contrastively, and uses the transition model to reparametrize the action space for sample-efficient imitation learning. We evaluate the practicality of our objective through experiments on a set of navigation and locomotion tasks. Our results verify the benefits suggested by our theory and show that TRAIL is able to improve baseline imitation learning by up to 4x in performance.
