Overcoming Knowledge Barriers: Online Imitation Learning from Visual Observation with Pretrained World Models
Xingyuan Zhang, Philip Becker-Ehmck, Patrick van der Smagt, Maximilian Karl
TL;DR
This paper identifies two key barriers—Embodiment Knowledge Barrier (EKB) and Demonstration Knowledge Barrier (DKB)—in Imitation Learning from Observation using pretrained world models. It introduces AIME-NoB, which combines online interaction with a data-driven regulariser to mitigate EKB and employs a surrogate reward to enlarge state coverage for DKB, implemented via a dreamer-style latent-actor critic. Empirical results on vision-based control benchmarks (DeepMind Control Suite and MetaWorld) show that AIME-NoB substantially improves sample efficiency and final performance over state-of-the-art baselines, with the AIL surrogate variant performing best. The work demonstrates the practical potential of pretrained world models for ILfO and outlines avenues for scheduling, scaling up models, and multi-task extensions.
Abstract
Pretraining and finetuning models has become increasingly popular in decision-making. But there are still serious impediments in Imitation Learning from Observation (ILfO) with pretrained models. This study identifies two primary obstacles: the Embodiment Knowledge Barrier (EKB) and the Demonstration Knowledge Barrier (DKB). The EKB emerges due to the pretrained models' limitations in handling novel observations, which leads to inaccurate action inference. Conversely, the DKB stems from the reliance on limited demonstration datasets, restricting the model's adaptability across diverse scenarios. We propose separate solutions to overcome each barrier and apply them to Action Inference by Maximising Evidence (AIME), a state-of-the-art algorithm. This new algorithm, AIME-NoB, integrates online interactions and a data-driven regulariser to mitigate the EKB. Additionally, it uses a surrogate reward function to broaden the policy's supported states, addressing the DKB. Our experiments on vision-based control tasks from the DeepMind Control Suite and MetaWorld benchmarks show that AIME-NoB significantly improves sample efficiency and converged performance, presenting a robust framework for overcoming the challenges in ILfO with pretrained models. Code available at https://github.com/IcarusWizard/AIME-NoB.
