Consistency Matters: Defining Demonstration Data Quality Metrics in Robot Learning from Demonstration
Maram Sakr, H. F. Machiel Van der Loos, Dana Kulic, Elizabeth Croft
TL;DR
This work tackles the quality of demonstrations in robot Learning from Demonstration by formalizing a comprehensive set of consistency metrics that span Cartesian and joint-space motion. Using two user studies with PR2 and UR5, the authors show that demonstration consistency strongly predicts both task success and generalization, with high explained variance in regression models and significant improvements with practice. A practical pipeline is presented that clusters demonstrations into consistent vs inconsistent without requiring expert data or task-specific algorithm modifications, enabling pre-training data curation. The findings suggest that smoother, more consistent demonstrations enable more reliable learning and generalization, and point toward adaptive training and active-learning strategies to further optimize data quality in real-world LfD deployments.
Abstract
Learning from Demonstration (LfD) empowers robots to acquire new skills through human demonstrations, making it feasible for everyday users to teach robots. However, the success of learning and generalization heavily depends on the quality of these demonstrations. Consistency is often used to indicate quality in LfD, yet the factors that define this consistency remain underexplored. In this paper, we evaluate a comprehensive set of motion data characteristics to determine which consistency measures best predict learning performance. By ensuring demonstration consistency prior to training, we enhance models' predictive accuracy and generalization to novel scenarios. We validate our approach with two user studies involving participants with diverse levels of robotics expertise. In the first study (N = 24), users taught a PR2 robot to perform a button-pressing task in a constrained environment, while in the second study (N = 30), participants trained a UR5 robot on a pick-and-place task. Results show that demonstration consistency significantly impacts success rates in both learning and generalization, with 70% and 89% of task success rates in the two studies predicted using our consistency metrics. Moreover, our metrics estimate generalized performance success rates with 76% and 91% accuracy. These findings suggest that our proposed measures provide an intuitive, practical way to assess demonstration data quality before training, without requiring expert data or algorithm-specific modifications. Our approach offers a systematic way to evaluate demonstration quality, addressing a critical gap in LfD by formalizing consistency metrics that enhance the reliability of robot learning from human demonstrations.
