InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons

Feiyu Gavin Zhu; Jean Oh; Reid Simmons

InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons

Feiyu Gavin Zhu, Jean Oh, Reid Simmons

TL;DR

InterPReT tackles the barrier of teaching AI agents to laypersons by combining instruction-guided policy restructuring with demonstration-based learning. The method maintains a sparse, interpretable policy graph and uses a large language model to generate and restructure policy structures from user instructions, while learning parameters via imitation learning from demonstrations. A multi-turn user study in a driving task demonstrates that InterPReT yields more robust policies under unseen conditions and with fewer demonstrations, without reducing usability. The results suggest significant potential for deploying end-user-focused imitation learning systems that leverage natural language instructions and policy-structure interpretability to facilitate broader, safer adoption.

Abstract

Imitation learning has shown success in many tasks by learning from expert demonstrations. However, most existing work relies on large-scale demonstrations from technical professionals and close monitoring of the training process. These are challenging for a layperson when they want to teach the agent new skills. To lower the barrier of teaching AI agents, we propose Interactive Policy Restructuring and Training (InterPReT), which takes user instructions to continually update the policy structure and optimize its parameters to fit user demonstrations. This enables end-users to interactively give instructions and demonstrations, monitor the agent's performance, and review the agent's decision-making strategies. A user study (N=34) on teaching an AI agent to drive in a racing game confirms that our approach yields more robust policies without impairing system usability, compared to a generic imitation learning baseline, when a layperson is responsible for both giving demonstrations and determining when to stop. This shows that our method is more suitable for end-users without much technical background in machine learning to train a dependable policy

InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons

TL;DR

Abstract

Paper Structure (32 sections, 13 equations, 19 figures)

This paper contains 32 sections, 13 equations, 19 figures.

Introduction
Related Work
Interactive Learning from Human Feedback
Complementing Learning with Language
Interactive Learning with Code Generation
InterPReT: Interactive Learning from Both Demonstrations and Instructions
Preliminary: Structured Policy
Agent
Policy Training
Policy Restructuring
Strategy Summary and Rollouts
User Study
Setup
Study Protocol
Hypotheses
...and 17 more sections

Figures (19)

Figure 1: Interaction modes in InterPReT. The user repeatedly interacts with the agent until they are satisfied.
Figure 2: A minimal example of a structured policy representing a proportional controller that maintains a constant "desired speed". Solid boxes are variables in $V$ (marked with observation $O$, latent $L$, or action $A$) and dashed boxes are operators in $P$. Weights $\Theta$ are associated with the edges. During inference, if the observed "current speed" is $O_1 = 40$, then we propogate the values $P_1 = 1, L_2 = 60, P_2 = 20, L_3 = 20, P_3 = 20, P4 = 0, A_4 = 0.2, A_5 = 0$, and the output action is $\hat{a} = [0.2, 0]^T$.
Figure 3: Environment rendering for the participant (left) and state representation for the policy (right). The coordinates for subsequent tiles are omitted due to space constraints.
Figure 4: Average speed in nominal condition
Figure 5: Number of demonstrations used
...and 14 more figures

InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons

TL;DR

Abstract

InterPReT: Interactive Policy Restructuring and Training Enable Effective Imitation Learning from Laypersons

Authors

TL;DR

Abstract

Table of Contents

Figures (19)