Table of Contents
Fetching ...

Assessing Similarity Measures for the Evaluation of Human-Robot Motion Correspondence

Charles Dietzel, Patrick J. Martin

TL;DR

This work tackles the human-robot motion correspondence problem by introducing heterogeneous time-series similarity measures as quantitative evaluation tools to complement subjective surveys. It evaluates an implicit behavioral cloning approach against a baseline MLP using both quantitative similarity scores (GDTW, Soft-GDTW, DTW-GI, Soft DTW-GI) and qualitative human judgments gathered from dancers and engineers. The results show that GDTW and Soft-GDTW best align with human perception of motion similarity, suggesting these measures are valuable for assessing non-humanoid motion correspondence. The study highlights how quantitative metrics can augment qualitative surveys in HRI and points to future work on broader robot platforms and potential use as training losses.

Abstract

One key area of research in Human-Robot Interaction is solving the human-robot correspondence problem, which asks how a robot can learn to reproduce a human motion demonstration when the human and robot have different dynamics and kinematic structures. Evaluating these correspondence problem solutions often requires the use of qualitative surveys that can be time consuming to design and administer. Additionally, qualitative survey results vary depending on the population of survey participants. In this paper, we propose the use of heterogeneous time-series similarity measures as a quantitative evaluation metric for evaluating motion correspondence to complement these qualitative surveys. To assess the suitability of these measures, we develop a behavioral cloning-based motion correspondence model, and evaluate it with a qualitative survey as well as quantitative measures. By comparing the resulting similarity scores with the human survey results, we identify Gromov Dynamic Time Warping as a promising quantitative measure for evaluating motion correspondence.

Assessing Similarity Measures for the Evaluation of Human-Robot Motion Correspondence

TL;DR

This work tackles the human-robot motion correspondence problem by introducing heterogeneous time-series similarity measures as quantitative evaluation tools to complement subjective surveys. It evaluates an implicit behavioral cloning approach against a baseline MLP using both quantitative similarity scores (GDTW, Soft-GDTW, DTW-GI, Soft DTW-GI) and qualitative human judgments gathered from dancers and engineers. The results show that GDTW and Soft-GDTW best align with human perception of motion similarity, suggesting these measures are valuable for assessing non-humanoid motion correspondence. The study highlights how quantitative metrics can augment qualitative surveys in HRI and points to future work on broader robot platforms and potential use as training losses.

Abstract

One key area of research in Human-Robot Interaction is solving the human-robot correspondence problem, which asks how a robot can learn to reproduce a human motion demonstration when the human and robot have different dynamics and kinematic structures. Evaluating these correspondence problem solutions often requires the use of qualitative surveys that can be time consuming to design and administer. Additionally, qualitative survey results vary depending on the population of survey participants. In this paper, we propose the use of heterogeneous time-series similarity measures as a quantitative evaluation metric for evaluating motion correspondence to complement these qualitative surveys. To assess the suitability of these measures, we develop a behavioral cloning-based motion correspondence model, and evaluate it with a qualitative survey as well as quantitative measures. By comparing the resulting similarity scores with the human survey results, we identify Gromov Dynamic Time Warping as a promising quantitative measure for evaluating motion correspondence.

Paper Structure

This paper contains 17 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: (a) The first stage of our approach collects kinesthetic and human-pose data to train behavioral-cloning based models. (b) The second stage deploys the resulting model from (a) onto the physical robot and evaluates the output qualitatively and quantitatively.
  • Figure 2: The data collection process that was used to build our data set for training and testing.
  • Figure 3: (a) An example output image from our human-pose data source and (b) the target non-humanoid robot.
  • Figure 4: This figure shows the average time-series similarity scores across the IBC and MLP baseline models. The black error bars show the standard deviation of the results.