Table of Contents
Fetching ...

MILE: Model-based Intervention Learning

Yigit Korkmaz, Erdem Bıyık

TL;DR

MILE tackles the problem of learning from human interventions in imitation learning by introducing a fully differentiable intervention mechanism that explains when and why humans intervene. It jointly trains a mental model of the human and the robot policy, using a probit-based intervention probability and two loss terms that balance intervention likelihood and action fidelity. Across simulation and real-world robot tasks, MILE demonstrates superior sample efficiency and robust performance with only a handful of interventions, and a human-subject study confirms the model's alignment with human behavior. This approach enables effective, data-efficient policy refinement in human-in-the-loop robotics without requiring extensive offline demonstrations or reward design.

Abstract

Imitation learning techniques have been shown to be highly effective in real-world control scenarios, such as robotics. However, these approaches not only suffer from compounding error issues but also require human experts to provide complete trajectories. Although there exist interactive methods where an expert oversees the robot and intervenes if needed, these extensions usually only utilize the data collected during intervention periods and ignore the feedback signal hidden in non-intervention timesteps. In this work, we create a model to formulate how the interventions occur in such cases, and show that it is possible to learn a policy with just a handful of expert interventions. Our key insight is that it is possible to get crucial information about the quality of the current state and the optimality of the chosen action from expert feedback, regardless of the presence or the absence of intervention. We evaluate our method on various discrete and continuous simulation environments, a real-world robotic manipulation task, as well as a human subject study. Videos and the code can be found at https://liralab.usc.edu/mile .

MILE: Model-based Intervention Learning

TL;DR

MILE tackles the problem of learning from human interventions in imitation learning by introducing a fully differentiable intervention mechanism that explains when and why humans intervene. It jointly trains a mental model of the human and the robot policy, using a probit-based intervention probability and two loss terms that balance intervention likelihood and action fidelity. Across simulation and real-world robot tasks, MILE demonstrates superior sample efficiency and robust performance with only a handful of interventions, and a human-subject study confirms the model's alignment with human behavior. This approach enables effective, data-efficient policy refinement in human-in-the-loop robotics without requiring extensive offline demonstrations or reward design.

Abstract

Imitation learning techniques have been shown to be highly effective in real-world control scenarios, such as robotics. However, these approaches not only suffer from compounding error issues but also require human experts to provide complete trajectories. Although there exist interactive methods where an expert oversees the robot and intervenes if needed, these extensions usually only utilize the data collected during intervention periods and ignore the feedback signal hidden in non-intervention timesteps. In this work, we create a model to formulate how the interventions occur in such cases, and show that it is possible to learn a policy with just a handful of expert interventions. Our key insight is that it is possible to get crucial information about the quality of the current state and the optimality of the chosen action from expert feedback, regardless of the presence or the absence of intervention. We evaluate our method on various discrete and continuous simulation environments, a real-world robotic manipulation task, as well as a human subject study. Videos and the code can be found at https://liralab.usc.edu/mile .

Paper Structure

This paper contains 12 sections, 11 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Overall MILE System. A human operator oversees the robot during task execution and may decide to take over control at any timestep. The human makes the decision to intervene or not based on their prediction of the robot's potential failure, without observing the robot's action in that particular state. During the data collection phase, all interactions with the environment are recorded, both with and without interventions. The policy is then trained on this dataset using an iterative process that incorporates our novel intervention model.
  • Figure 2: Framework for learning from interventions. Starting from an inital policy $\pi_\theta$, we jointly train the mental model, and the policy using our intervention model.
  • Figure 3: Success rates for single iteration training (mean$\pm$std). MILE was trained with $N=1$ iteration and $k=15$ episodes. To make it a fair comparison, we train the ohter baselines until they have the same number of interventions in their datasets as ours.
  • Figure 4: Success rates for iterative training (mean$\pm$s.e.).
  • Figure 5: Success rates for the demo ablation study (mean$\pm$s.e.).
  • ...and 2 more figures