Assessing Similarity Measures for the Evaluation of Human-Robot Motion Correspondence
Charles Dietzel, Patrick J. Martin
TL;DR
This work tackles the human-robot motion correspondence problem by introducing heterogeneous time-series similarity measures as quantitative evaluation tools to complement subjective surveys. It evaluates an implicit behavioral cloning approach against a baseline MLP using both quantitative similarity scores (GDTW, Soft-GDTW, DTW-GI, Soft DTW-GI) and qualitative human judgments gathered from dancers and engineers. The results show that GDTW and Soft-GDTW best align with human perception of motion similarity, suggesting these measures are valuable for assessing non-humanoid motion correspondence. The study highlights how quantitative metrics can augment qualitative surveys in HRI and points to future work on broader robot platforms and potential use as training losses.
Abstract
One key area of research in Human-Robot Interaction is solving the human-robot correspondence problem, which asks how a robot can learn to reproduce a human motion demonstration when the human and robot have different dynamics and kinematic structures. Evaluating these correspondence problem solutions often requires the use of qualitative surveys that can be time consuming to design and administer. Additionally, qualitative survey results vary depending on the population of survey participants. In this paper, we propose the use of heterogeneous time-series similarity measures as a quantitative evaluation metric for evaluating motion correspondence to complement these qualitative surveys. To assess the suitability of these measures, we develop a behavioral cloning-based motion correspondence model, and evaluate it with a qualitative survey as well as quantitative measures. By comparing the resulting similarity scores with the human survey results, we identify Gromov Dynamic Time Warping as a promising quantitative measure for evaluating motion correspondence.
