Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning
Iason Chrysomallis, Georgios Chalkiadakis
TL;DR
This work tackles imitation learning from observation-only, potentially suboptimal expert data by proposing a deep implicit imitation RL framework that blends expert guidance with independent environmental interaction. The core method, DIIQN, infers expert actions online, samples from expert observations, and uses a dynamic confidence mechanism to balance expert- and self-guided learning, enabling performance beyond the observed demonstrations. An extension, HA-DIIQN, addresses heterogeneous action spaces by identifying infeasible transitions and discovering feasible bridges, allowing knowledge transfer across agents with different capabilities. Experimental results show up to 130% gains over a DQN baseline and up to 64% faster learning in heterogeneous settings, with robust performance across dataset sizes and hyperparameters, highlighting the practical potential for leveraging suboptimal, observation-only data in real-world RL tasks.
Abstract
Imitation learning traditionally requires complete state-action demonstrations from optimal or near-optimal experts. These requirements severely limit practical applicability, as many real-world scenarios provide only state observations without corresponding actions and expert performance is often suboptimal. In this paper we introduce a deep implicit imitation reinforcement learning framework that addresses both limitations by combining deep reinforcement learning with implicit imitation learning from observation-only datasets. Our main algorithm, Deep Implicit Imitation Q-Network (DIIQN), employs an action inference mechanism that reconstructs expert actions through online exploration and integrates a dynamic confidence mechanism that adaptively balances expert-guided and self-directed learning. This enables the agent to leverage expert guidance for accelerated training while maintaining capacity to surpass suboptimal expert performance. We further extend our framework with a Heterogeneous Actions DIIQN (HA-DIIQN) algorithm to tackle scenarios where expert and agent possess different action sets, a challenge previously unaddressed in the implicit imitation learning literature. HA-DIIQN introduces an infeasibility detection mechanism and a bridging procedure identifying alternative pathways connecting agent capabilities to expert guidance when direct action replication is impossible. Our experimental results demonstrate that DIIQN achieves up to 130% higher episodic returns compared to standard DQN, while consistently outperforming existing implicit imitation methods that cannot exceed expert performance. In heterogeneous action settings, HA-DIIQN learns up to 64% faster than baselines, leveraging expert datasets unusable by conventional approaches. Extensive parameter sensitivity analysis reveals the framework's robustness across varying dataset sizes and hyperparameter configurations.
