MILE: Model-based Intervention Learning
Yigit Korkmaz, Erdem Bıyık
TL;DR
MILE tackles the problem of learning from human interventions in imitation learning by introducing a fully differentiable intervention mechanism that explains when and why humans intervene. It jointly trains a mental model of the human and the robot policy, using a probit-based intervention probability and two loss terms that balance intervention likelihood and action fidelity. Across simulation and real-world robot tasks, MILE demonstrates superior sample efficiency and robust performance with only a handful of interventions, and a human-subject study confirms the model's alignment with human behavior. This approach enables effective, data-efficient policy refinement in human-in-the-loop robotics without requiring extensive offline demonstrations or reward design.
Abstract
Imitation learning techniques have been shown to be highly effective in real-world control scenarios, such as robotics. However, these approaches not only suffer from compounding error issues but also require human experts to provide complete trajectories. Although there exist interactive methods where an expert oversees the robot and intervenes if needed, these extensions usually only utilize the data collected during intervention periods and ignore the feedback signal hidden in non-intervention timesteps. In this work, we create a model to formulate how the interventions occur in such cases, and show that it is possible to learn a policy with just a handful of expert interventions. Our key insight is that it is possible to get crucial information about the quality of the current state and the optimality of the chosen action from expert feedback, regardless of the presence or the absence of intervention. We evaluate our method on various discrete and continuous simulation environments, a real-world robotic manipulation task, as well as a human subject study. Videos and the code can be found at https://liralab.usc.edu/mile .
