Table of Contents
Fetching ...

Comparison of Motion Encoding Frameworks on Human Manipulation Actions

Lennart Jahn, Florentin Wörgötter, Tomas Kulvicius

TL;DR

The paper benchmarks five movement encoding frameworks—DMPs, tbGMR/TP-GMM, SEDS, ProMPs, and OCPs—on a large dataset of human manipulation trajectories to compare reconstruction accuracy and generalization to unseen start/end points. It demonstrates that DMPs and OCPs achieve high encoding efficiency and accuracy with sufficient kernels, while DMPs, OCPs, and TP-GMM offer comparable generalization performance; ProMPs require more demonstrations, and SEDS often fails to converge or generalize well. The study provides detailed hyperparameter analyses, reveals model-specific tradeoffs (eg, velocity oscillations in tbGMR/TP-GMM), and emphasizes the importance of task-dependent model selection for robotic trajectory representations. By releasing the dataset and a rigorous evaluation protocol, the work offers a practical resource for researchers to tailor trajectory encoding to specific manipulation tasks.

Abstract

Movement generation, and especially generalisation to unseen situations, plays an important role in robotics. Different types of movement generation methods exist such as spline based methods, dynamical system based methods, and methods based on Gaussian mixture models (GMMs). Using a large, new dataset on human manipulations, in this paper we provide a highly detailed comparison of five fundamentally different and widely used movement encoding and generation frameworks: dynamic movement primitives (DMPs), time based Gaussian mixture regression (tbGMR), stable estimator of dynamical systems (SEDS), Probabilistic Movement Primitives (ProMP) and Optimal Control Primitives (OCP). We compare these frameworks with respect to their movement encoding efficiency, reconstruction accuracy, and movement generalisation capabilities. The new dataset consists of nine object manipulation actions performed by 12 humans: pick and place, put on top/take down, put inside/take out, hide/uncover, and push/pull with a total of 7,652 movement examples. Our analysis shows that for movement encoding and reconstruction DMPs and OCPs are the most efficient with respect to the number of parameters and reconstruction accuracy, if a sufficient number of kernels is used. In case of movement generalisation to new start- and end-point situations, DMPs, OCPs and task parameterized GMM (TP-GMM, movement generalisation framework based on tbGMR) lead to similar performance, which ProMPs only achieve when using many demonstrations for learning. All models outperform SEDS, which additionally proves to be difficult to fit. Furthermore we observe that TP-GMM and SEDS suffer from problems reaching the end-points of generalizations.These different quantitative results will help selecting the most appropriate models and designing trajectory representations in an improved task-dependent way in future robotic applications.

Comparison of Motion Encoding Frameworks on Human Manipulation Actions

TL;DR

The paper benchmarks five movement encoding frameworks—DMPs, tbGMR/TP-GMM, SEDS, ProMPs, and OCPs—on a large dataset of human manipulation trajectories to compare reconstruction accuracy and generalization to unseen start/end points. It demonstrates that DMPs and OCPs achieve high encoding efficiency and accuracy with sufficient kernels, while DMPs, OCPs, and TP-GMM offer comparable generalization performance; ProMPs require more demonstrations, and SEDS often fails to converge or generalize well. The study provides detailed hyperparameter analyses, reveals model-specific tradeoffs (eg, velocity oscillations in tbGMR/TP-GMM), and emphasizes the importance of task-dependent model selection for robotic trajectory representations. By releasing the dataset and a rigorous evaluation protocol, the work offers a practical resource for researchers to tailor trajectory encoding to specific manipulation tasks.

Abstract

Movement generation, and especially generalisation to unseen situations, plays an important role in robotics. Different types of movement generation methods exist such as spline based methods, dynamical system based methods, and methods based on Gaussian mixture models (GMMs). Using a large, new dataset on human manipulations, in this paper we provide a highly detailed comparison of five fundamentally different and widely used movement encoding and generation frameworks: dynamic movement primitives (DMPs), time based Gaussian mixture regression (tbGMR), stable estimator of dynamical systems (SEDS), Probabilistic Movement Primitives (ProMP) and Optimal Control Primitives (OCP). We compare these frameworks with respect to their movement encoding efficiency, reconstruction accuracy, and movement generalisation capabilities. The new dataset consists of nine object manipulation actions performed by 12 humans: pick and place, put on top/take down, put inside/take out, hide/uncover, and push/pull with a total of 7,652 movement examples. Our analysis shows that for movement encoding and reconstruction DMPs and OCPs are the most efficient with respect to the number of parameters and reconstruction accuracy, if a sufficient number of kernels is used. In case of movement generalisation to new start- and end-point situations, DMPs, OCPs and task parameterized GMM (TP-GMM, movement generalisation framework based on tbGMR) lead to similar performance, which ProMPs only achieve when using many demonstrations for learning. All models outperform SEDS, which additionally proves to be difficult to fit. Furthermore we observe that TP-GMM and SEDS suffer from problems reaching the end-points of generalizations.These different quantitative results will help selecting the most appropriate models and designing trajectory representations in an improved task-dependent way in future robotic applications.
Paper Structure (65 sections, 17 equations, 13 figures, 3 tables)

This paper contains 65 sections, 17 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Recording setup.
  • Figure 2: Schematic representation of the selected movement target positions on a grid with 10 cm spacing (see also Figure \ref{['fig:exampleview']}). The start/end position is labeled S, the target positions are grouped into groups indicated by color and numbers.
  • Figure 3: Examples of trajectories from the movement dataset. Green and red dots denote start- and end-points, respectively.
  • Figure 4: Example of human trajectories of a take down action and reconstructions for different numbers of kernels: 3 (a), 6 (b) and 11 (c). Position and velocity profiles are shown in the top and bottom rows, respectively, whereas the plots at the bottom of each panel show the deviation from human the trajectory. Green and red dots denote start- and end-points, respectively.
  • Figure 5: Comparison of movement encoding frameworks on the movement reconstruction task. Median reconstruction error vs. number of kernels is shown for each model. Failed SEDS encodings are not included in this statistics (see Section \ref{['sec:sedsconvergence']}). Note, there is no result for SEDS for $K=11$ because the encoding time for high $K$ values using the repeated optimization method (see Section \ref{['sec:seds_model']}) is impractically long.
  • ...and 8 more figures