Comparison of Motion Encoding Frameworks on Human Manipulation Actions

Lennart Jahn; Florentin Wörgötter; Tomas Kulvicius

Comparison of Motion Encoding Frameworks on Human Manipulation Actions

Lennart Jahn, Florentin Wörgötter, Tomas Kulvicius

TL;DR

The paper benchmarks five movement encoding frameworks—DMPs, tbGMR/TP-GMM, SEDS, ProMPs, and OCPs—on a large dataset of human manipulation trajectories to compare reconstruction accuracy and generalization to unseen start/end points. It demonstrates that DMPs and OCPs achieve high encoding efficiency and accuracy with sufficient kernels, while DMPs, OCPs, and TP-GMM offer comparable generalization performance; ProMPs require more demonstrations, and SEDS often fails to converge or generalize well. The study provides detailed hyperparameter analyses, reveals model-specific tradeoffs (eg, velocity oscillations in tbGMR/TP-GMM), and emphasizes the importance of task-dependent model selection for robotic trajectory representations. By releasing the dataset and a rigorous evaluation protocol, the work offers a practical resource for researchers to tailor trajectory encoding to specific manipulation tasks.

Abstract

Movement generation, and especially generalisation to unseen situations, plays an important role in robotics. Different types of movement generation methods exist such as spline based methods, dynamical system based methods, and methods based on Gaussian mixture models (GMMs). Using a large, new dataset on human manipulations, in this paper we provide a highly detailed comparison of five fundamentally different and widely used movement encoding and generation frameworks: dynamic movement primitives (DMPs), time based Gaussian mixture regression (tbGMR), stable estimator of dynamical systems (SEDS), Probabilistic Movement Primitives (ProMP) and Optimal Control Primitives (OCP). We compare these frameworks with respect to their movement encoding efficiency, reconstruction accuracy, and movement generalisation capabilities. The new dataset consists of nine object manipulation actions performed by 12 humans: pick and place, put on top/take down, put inside/take out, hide/uncover, and push/pull with a total of 7,652 movement examples. Our analysis shows that for movement encoding and reconstruction DMPs and OCPs are the most efficient with respect to the number of parameters and reconstruction accuracy, if a sufficient number of kernels is used. In case of movement generalisation to new start- and end-point situations, DMPs, OCPs and task parameterized GMM (TP-GMM, movement generalisation framework based on tbGMR) lead to similar performance, which ProMPs only achieve when using many demonstrations for learning. All models outperform SEDS, which additionally proves to be difficult to fit. Furthermore we observe that TP-GMM and SEDS suffer from problems reaching the end-points of generalizations.These different quantitative results will help selecting the most appropriate models and designing trajectory representations in an improved task-dependent way in future robotic applications.

Comparison of Motion Encoding Frameworks on Human Manipulation Actions

TL;DR

Abstract

Paper Structure (65 sections, 17 equations, 13 figures, 3 tables)

This paper contains 65 sections, 17 equations, 13 figures, 3 tables.

Introduction
Motion encoding frameworks
Own contribution
Dataset
Recording setup
Manipulation actions
Extraction of the 3D trajectories
Movement Encoding Frameworks
Dynamical Movement Primitives
Model description
Generalization using DMPs
Task Parameterized Gaussian Mixture Models
Model description
Time based GMR
Task parameterized GMM
...and 50 more sections

Figures (13)

Figure 1: Recording setup.
Figure 2: Schematic representation of the selected movement target positions on a grid with 10 cm spacing (see also Figure \ref{['fig:exampleview']}). The start/end position is labeled S, the target positions are grouped into groups indicated by color and numbers.
Figure 3: Examples of trajectories from the movement dataset. Green and red dots denote start- and end-points, respectively.
Figure 4: Example of human trajectories of a take down action and reconstructions for different numbers of kernels: 3 (a), 6 (b) and 11 (c). Position and velocity profiles are shown in the top and bottom rows, respectively, whereas the plots at the bottom of each panel show the deviation from human the trajectory. Green and red dots denote start- and end-points, respectively.
Figure 5: Comparison of movement encoding frameworks on the movement reconstruction task. Median reconstruction error vs. number of kernels is shown for each model. Failed SEDS encodings are not included in this statistics (see Section \ref{['sec:sedsconvergence']}). Note, there is no result for SEDS for $K=11$ because the encoding time for high $K$ values using the repeated optimization method (see Section \ref{['sec:seds_model']}) is impractically long.
...and 8 more figures

Comparison of Motion Encoding Frameworks on Human Manipulation Actions

TL;DR

Abstract

Comparison of Motion Encoding Frameworks on Human Manipulation Actions

Authors

TL;DR

Abstract

Table of Contents

Figures (13)