imitation: Clean Imitation Learning Implementations

Adam Gleave; Mohammad Taufeeque; Juan Rocamonde; Erik Jenner; Steven H. Wang; Sam Toyer; Maximilian Ernestus; Nora Belrose; Scott Emmons; Stuart Russell

imitation: Clean Imitation Learning Implementations

Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

TL;DR

The paper presents imitation, an open-source library that standardizes seven reward and imitation-learning algorithms within a modular PyTorch/SB3 framework to provide reliable baselines and facilitate rapid algorithm development. It emphasizes a consistent API, comprehensive documentation, and an experimental framework with replicable scripts and rigorous testing (98% coverage). Thorough benchmarking against prior implementations demonstrates competitive performance across standard Gym/MuJoCo tasks, with clear notes on where certain algorithms underperform in specific environments. The work aims to improve reproducibility and accessibility for researchers by delivering robust baselines, extensible code, and detailed benchmarking results.

Abstract

imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse reinforcement learning (IRL) algorithms, three imitation learning algorithms and a preference comparison algorithm. The implementations have been benchmarked against previous results, and automated tests cover 98% of the code. Moreover, the algorithms are implemented in a modular fashion, making it simple to develop novel algorithms in the framework. Our source code, including documentation and examples, is available at https://github.com/HumanCompatibleAI/imitation

imitation: Clean Imitation Learning Implementations

TL;DR

Abstract

Paper Structure (11 sections, 1 figure, 3 tables)

This paper contains 11 sections, 1 figure, 3 tables.

Introduction
Features
Comprehensive
Consistent Interface
Experimental Framework
Modularity
Documentation
High-Quality Implementations
Comparison to Other Software
Detailed benchmarking results
Environments used for benchmarking

Figures (1)

Figure 1: Returns of our algorithms normalized so that $1$ is the returns of an expert policy and $0$ is that of a random policy. Our algorithms reach close to expert performance on most environments. Detailed results, including confidence intervals, can be found in Table \ref{['tab:results']}.

imitation: Clean Imitation Learning Implementations

TL;DR

Abstract

imitation: Clean Imitation Learning Implementations

Authors

TL;DR

Abstract

Table of Contents

Figures (1)