Table of Contents
Fetching ...

Consistency Matters: Defining Demonstration Data Quality Metrics in Robot Learning from Demonstration

Maram Sakr, H. F. Machiel Van der Loos, Dana Kulic, Elizabeth Croft

TL;DR

This work tackles the quality of demonstrations in robot Learning from Demonstration by formalizing a comprehensive set of consistency metrics that span Cartesian and joint-space motion. Using two user studies with PR2 and UR5, the authors show that demonstration consistency strongly predicts both task success and generalization, with high explained variance in regression models and significant improvements with practice. A practical pipeline is presented that clusters demonstrations into consistent vs inconsistent without requiring expert data or task-specific algorithm modifications, enabling pre-training data curation. The findings suggest that smoother, more consistent demonstrations enable more reliable learning and generalization, and point toward adaptive training and active-learning strategies to further optimize data quality in real-world LfD deployments.

Abstract

Learning from Demonstration (LfD) empowers robots to acquire new skills through human demonstrations, making it feasible for everyday users to teach robots. However, the success of learning and generalization heavily depends on the quality of these demonstrations. Consistency is often used to indicate quality in LfD, yet the factors that define this consistency remain underexplored. In this paper, we evaluate a comprehensive set of motion data characteristics to determine which consistency measures best predict learning performance. By ensuring demonstration consistency prior to training, we enhance models' predictive accuracy and generalization to novel scenarios. We validate our approach with two user studies involving participants with diverse levels of robotics expertise. In the first study (N = 24), users taught a PR2 robot to perform a button-pressing task in a constrained environment, while in the second study (N = 30), participants trained a UR5 robot on a pick-and-place task. Results show that demonstration consistency significantly impacts success rates in both learning and generalization, with 70% and 89% of task success rates in the two studies predicted using our consistency metrics. Moreover, our metrics estimate generalized performance success rates with 76% and 91% accuracy. These findings suggest that our proposed measures provide an intuitive, practical way to assess demonstration data quality before training, without requiring expert data or algorithm-specific modifications. Our approach offers a systematic way to evaluate demonstration quality, addressing a critical gap in LfD by formalizing consistency metrics that enhance the reliability of robot learning from human demonstrations.

Consistency Matters: Defining Demonstration Data Quality Metrics in Robot Learning from Demonstration

TL;DR

This work tackles the quality of demonstrations in robot Learning from Demonstration by formalizing a comprehensive set of consistency metrics that span Cartesian and joint-space motion. Using two user studies with PR2 and UR5, the authors show that demonstration consistency strongly predicts both task success and generalization, with high explained variance in regression models and significant improvements with practice. A practical pipeline is presented that clusters demonstrations into consistent vs inconsistent without requiring expert data or task-specific algorithm modifications, enabling pre-training data curation. The findings suggest that smoother, more consistent demonstrations enable more reliable learning and generalization, and point toward adaptive training and active-learning strategies to further optimize data quality in real-world LfD deployments.

Abstract

Learning from Demonstration (LfD) empowers robots to acquire new skills through human demonstrations, making it feasible for everyday users to teach robots. However, the success of learning and generalization heavily depends on the quality of these demonstrations. Consistency is often used to indicate quality in LfD, yet the factors that define this consistency remain underexplored. In this paper, we evaluate a comprehensive set of motion data characteristics to determine which consistency measures best predict learning performance. By ensuring demonstration consistency prior to training, we enhance models' predictive accuracy and generalization to novel scenarios. We validate our approach with two user studies involving participants with diverse levels of robotics expertise. In the first study (N = 24), users taught a PR2 robot to perform a button-pressing task in a constrained environment, while in the second study (N = 30), participants trained a UR5 robot on a pick-and-place task. Results show that demonstration consistency significantly impacts success rates in both learning and generalization, with 70% and 89% of task success rates in the two studies predicted using our consistency metrics. Moreover, our metrics estimate generalized performance success rates with 76% and 91% accuracy. These findings suggest that our proposed measures provide an intuitive, practical way to assess demonstration data quality before training, without requiring expert data or algorithm-specific modifications. Our approach offers a systematic way to evaluate demonstration quality, addressing a critical gap in LfD by formalizing consistency metrics that enhance the reliability of robot learning from human demonstrations.

Paper Structure

This paper contains 30 sections, 5 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Overview of the proposed approach. Data was collected from users with varying levels of expertise in robotics. The range of metrics listed in Table \ref{['tab:metrics']} was calculated to serve as feature inputs for the clustering algorithm. The data was then clustered into two groups: consistent and inconsistent. A learning model was subsequently trained using consistent and inconsistent data. Finally, performance was evaluated by calculating the success rate of the trained models.
  • Figure 2: Button-pressing task demonstration overview from (a) the initial position of the robot, (b) an example of a user's struggle to get the robot around the box, (c) an example of a better manoeuvre of the robot around the box, and (d) the robot pressing the button.
  • Figure 3: Task demonstration overview: (a) Robot's initial position, (b) Picking up the bottle, (c) manoeuvring under the obstacle, (d) Placing the bottle, (e) Top view of pickup, obstacle and place locations, and (f) Experimental bottle with spillage line.
  • Figure 4: The tasks simulation includes both the same task states and the generalized states used to evaluate model performance. (I) In the button-pressing task, the buttons are represented as green spheres. (II) In the pick-and-place task, the pickup and place locations are also represented by spheres. The blue segment of the trajectory represents the movement from the initial position to the pickup location, while the green segment represents the movement from the pickup to the place location.
  • Figure 5: The standardized feature values in the resulting two clusters of demonstrations. The first cluster (blue) includes consistent demonstrations and the second cluster (orange) includes inconsistent demonstrations.
  • ...and 4 more figures