Table of Contents
Fetching ...

Evaluating the Evaluators: Towards Human-aligned Metrics for Missing Markers Reconstruction

Taras Kucherenko, Derek Peristy, Judith Bütepage

TL;DR

This work addresses the evaluation gap in missing-marker reconstruction for MoCap by showing that standard $MSE$-based metrics do not correlate with perceived reconstruction quality. It introduces spatial-temporal metrics, Bone Distance Preservation (BDP) and Velocity Distance (VD), and anchors their relevance with a real-world user study involving professional animators. Through correlations, it demonstrates that VD (especially with ground truth) and BD without GT align better with human judgments than RMSE, highlighting the need for human-aligned metrics and offering a path toward more meaningful evaluation in MoCap cleaning. The findings underscore the practical impact of adopting perceptually faithful metrics and encourage the development of standardized, task-specific datasets to drive progress in missing-marker reconstruction.

Abstract

Animation data is often obtained through optical motion capture systems, which utilize a multitude of cameras to establish the position of optical markers. However, system errors or occlusions can result in missing markers, the manual cleaning of which can be time-consuming. This has sparked interest in machine learning-based solutions for missing marker reconstruction in the academic community. Most academic papers utilize a simplistic mean square error as the main metric. In this paper, we show that this metric does not correlate with subjective perception of the fill quality. Additionally, we introduce and evaluate a set of better-correlated metrics that can drive progress in the field.

Evaluating the Evaluators: Towards Human-aligned Metrics for Missing Markers Reconstruction

TL;DR

This work addresses the evaluation gap in missing-marker reconstruction for MoCap by showing that standard -based metrics do not correlate with perceived reconstruction quality. It introduces spatial-temporal metrics, Bone Distance Preservation (BDP) and Velocity Distance (VD), and anchors their relevance with a real-world user study involving professional animators. Through correlations, it demonstrates that VD (especially with ground truth) and BD without GT align better with human judgments than RMSE, highlighting the need for human-aligned metrics and offering a path toward more meaningful evaluation in MoCap cleaning. The findings underscore the practical impact of adopting perceptually faithful metrics and encourage the development of standardized, task-specific datasets to drive progress in missing-marker reconstruction.

Abstract

Animation data is often obtained through optical motion capture systems, which utilize a multitude of cameras to establish the position of optical markers. However, system errors or occlusions can result in missing markers, the manual cleaning of which can be time-consuming. This has sparked interest in machine learning-based solutions for missing marker reconstruction in the academic community. Most academic papers utilize a simplistic mean square error as the main metric. In this paper, we show that this metric does not correlate with subjective perception of the fill quality. Additionally, we introduce and evaluate a set of better-correlated metrics that can drive progress in the field.

Paper Structure

This paper contains 23 sections, 11 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Hips outwards model illustration. Markers are filled one body part at a time in the following order: hips -> torso -> head -> limbs.
  • Figure 2: Screenshot from the data provided in the user study.
  • Figure 3: Results of the user study. Fractions of each rating (with y-axis on the left) as well as average scores (with y-axis on the right) with their 95% confidence intervals.
  • Figure 4: Results of the objective evaluation in cm with log-scale on the x-axis. RMSE stands for Root Mean Square Error, VD stands for Velocity Distance, GT stands for Ground Truth, and BDP stands for Bone Distance Preservation. Lower values indicate better performance. System are sorted according to their ratings in the user study.
  • Figure 5: Metric values plotted against the average subjective perception score for each stimulus considered. The metrics include RMSE; VD with GT, VD, BDP with GT and BDP. For readibility, we plot the data without the Additive Noise condition in the left column and with the Additive Noise condition in the right column.