Inverse Reinforcement Learning without Reinforcement Learning
Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
TL;DR
The paper tackles the inefficiency of traditional IRL, where inner-loop RL solves dominate computation, by introducing expert resets that exploit the expert state distribution. It presents two exponential-speedup reductions, MMDP and NRMM, with polynomial sample complexity, and a meta-algorithm FILTER that blends resets with standard exploration to balance horizon errors. Theoretical results show improved sample complexity and error bounds, while experiments on continuous-control benchmarks demonstrate faster and more robust imitation learning. The work offers a reduction-based framework for faster IRL and suggests broadly applicable directions for leveraging expert demonstrations across problems.
Abstract
Inverse Reinforcement Learning (IRL) is a powerful set of techniques for imitation learning that aims to learn a reward function that rationalizes expert demonstrations. Unfortunately, traditional IRL methods suffer from a computational weakness: they require repeatedly solving a hard reinforcement learning (RL) problem as a subroutine. This is counter-intuitive from the viewpoint of reductions: we have reduced the easier problem of imitation learning to repeatedly solving the harder problem of RL. Another thread of work has proved that access to the side-information of the distribution of states where a strong policy spends time can dramatically reduce the sample and computational complexities of solving an RL problem. In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an exponential speedup in theory. In practice, we find that we are able to significantly speed up the prior art on continuous control tasks.
