Table of Contents
Fetching ...

Demonstration Sidetracks: Categorizing Systematic Non-Optimality in Human Demonstrations

Shijie Fang, Hang Yu, Qidi Fang, Reuben M. Aronson, Elaine S. Short

TL;DR

This work reveals that non-expert human demonstrations in learning-from-demonstration are not random noise but exhibit systematic patterns, termed Demonstration Sidetracks. Through a public-space study with 40 participants performing a long-horizon ice-cream topping task, the authors categorize sidetracks into Exploration, Mistake, Pause, Alignment, and a One-dimension control pattern, and show these behaviors cluster around task-phase changes and are influenced by the control interface. Replaying demonstrations in simulation with open-coded annotations and expert validation provides a robust dataset and methodology for identifying suboptimal yet structured demonstrations. The findings argue for incorporating realistic suboptimality models into LfD algorithms and for bridging the gap between controlled lab data and real-world robot deployments.

Abstract

Learning from Demonstration (LfD) is a popular approach for robots to acquire new skills, but most LfD methods suffer from imperfections in human demonstrations. Prior work typically treats these suboptimalities as random noise. In this paper we study non-optimal behaviors in non-expert demonstrations and show that they are systematic, forming what we call demonstration sidetracks. Using a public space study with 40 participants performing a long-horizon robot task, we recreated the setup in simulation and annotated all demonstrations. We identify four types of sidetracks (Exploration, Mistake, Alignment, Pause) and one control pattern (one-dimension control). Sidetracks appear frequently across participants, and their temporal and spatial distribution is tied to task context. We also find that users' control patterns depend on the control interface. These insights point to the need for better models of suboptimal demonstrations to improve LfD algorithms and bridge the gap between lab training and real-world deployment. All demonstrations, infrastructure, and annotations are available at https://github.com/AABL-Lab/Human-Demonstration-Sidetracks.

Demonstration Sidetracks: Categorizing Systematic Non-Optimality in Human Demonstrations

TL;DR

This work reveals that non-expert human demonstrations in learning-from-demonstration are not random noise but exhibit systematic patterns, termed Demonstration Sidetracks. Through a public-space study with 40 participants performing a long-horizon ice-cream topping task, the authors categorize sidetracks into Exploration, Mistake, Pause, Alignment, and a One-dimension control pattern, and show these behaviors cluster around task-phase changes and are influenced by the control interface. Replaying demonstrations in simulation with open-coded annotations and expert validation provides a robust dataset and methodology for identifying suboptimal yet structured demonstrations. The findings argue for incorporating realistic suboptimality models into LfD algorithms and for bridging the gap between controlled lab data and real-world robot deployments.

Abstract

Learning from Demonstration (LfD) is a popular approach for robots to acquire new skills, but most LfD methods suffer from imperfections in human demonstrations. Prior work typically treats these suboptimalities as random noise. In this paper we study non-optimal behaviors in non-expert demonstrations and show that they are systematic, forming what we call demonstration sidetracks. Using a public space study with 40 participants performing a long-horizon robot task, we recreated the setup in simulation and annotated all demonstrations. We identify four types of sidetracks (Exploration, Mistake, Alignment, Pause) and one control pattern (one-dimension control). Sidetracks appear frequently across participants, and their temporal and spatial distribution is tied to task context. We also find that users' control patterns depend on the control interface. These insights point to the need for better models of suboptimal demonstrations to improve LfD algorithms and bridge the gap between lab training and real-world deployment. All demonstrations, infrastructure, and annotations are available at https://github.com/AABL-Lab/Human-Demonstration-Sidetracks.

Paper Structure

This paper contains 23 sections, 6 figures.

Figures (6)

  • Figure 1: Public space study and ice cream topping adding task. Participants provided demonstrations in a non-lab setup with a long-horizon task. The goal of the task is to control the robot arm to first pick up one of the four topping jars, and then pour the toppings onto an ice cream.
  • Figure 2: Experiment pipeline and demonstration sidetracks. Non-expert demonstrations were collected by having participants control a real robot. We then replayed demonstrations in simulation to annotate the demonstration sidetracks. We identified four types of demonstration sidetracks and one control pattern. We illustrated demonstration sidetracks -- Exploration, Mistake, Alignment, and Pause -- on the right side of our figure.
  • Figure 3: Demonstration sidetracks frequency and ratio. The left figure shows the number of demonstration sidetracks observed across all demonstrators. The X-axis is the participant ID, and the Y-axis is the number of each type of demonstration sidetrack. The figure on the right side shows the percentage of time spent on each type of demonstration sidetracks relative to the total time. The results indicate that demonstration sidetracks widely and frequently exist in non-expert demonstrations.
  • Figure 4: Temporal relationships between task phases and demonstration sidetracks. The upper-left part shows the percentage of demonstration sidetracks in different sub-tasks. The bottom-left part shows the number of demonstration sidetracks happened within 40% percent timesteps window around the phase change. The X-axis displays different phases of the task in order, and the Y-axises are percentages and frequency separately. The right part shows the percentage of each type of demonstration sidetracks happening within a 4-second window around the task phase change. We found that the occurrence of demonstration sidetracks is associated with the change of the task phases.
  • Figure 5: The spatial distribution of Alignment behaviors. The points represent the position of the robot end-effector positions at the start and end of each Alignment behavior. The red circles represent the positions of four jars, while the green circle represents the ice cream cup. Points within 0.1 meters from the jars and the cup are highlighted in orange and pink. Other points outside this range are illustrated in blue. We found that Alignment behaviors more frequently occur around target objects.
  • ...and 1 more figures