Redefining Data Pairing for Motion Retargeting Leveraging a Human Body Prior

Xiyana Figuera; Soogeun Park; Hyemin Ahn

Redefining Data Pairing for Motion Retargeting Leveraging a Human Body Prior

Xiyana Figuera, Soogeun Park, Hyemin Ahn

TL;DR

A two-stage motion retargeting neural network that can be trained via supervised learning on a large amount of paired data and compared to other learning-based methods trained via unsupervised learning, it was found that the deep neural network trained with ample high-quality paired data achieved notable performance.

Abstract

We propose MR HuBo(Motion Retargeting leveraging a HUman BOdy prior), a cost-effective and convenient method to collect high-quality upper body paired <robot, human> pose data, which is essential for data-driven motion retargeting methods. Unlike existing approaches which collect <robot, human> pose data by converting human MoCap poses into robot poses, our method goes in reverse. We first sample diverse random robot poses, and then convert them into human poses. However, since random robot poses can result in extreme and infeasible human poses, we propose an additional technique to sort out extreme poses by exploiting a human body prior trained from a large amount of human pose data. Our data collection method can be used for any humanoid robots, if one designs or optimizes the system's hyperparameters which include a size scale factor and the joint angle ranges for sampling. In addition to this data collection method, we also present a two-stage motion retargeting neural network that can be trained via supervised learning on a large amount of paired data. Compared to other learning-based methods trained via unsupervised learning, we found that our deep neural network trained with ample high-quality paired data achieved notable performance. Our experiments also show that our data filtering method yields better retargeting results than training the model with raw and noisy data. Our code and video results are available on https://sites.google.com/view/mr-hubo/

Redefining Data Pairing for Motion Retargeting Leveraging a Human Body Prior

TL;DR

Abstract

Paper Structure (16 sections, 3 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 3 equations, 5 figures, 2 tables, 1 algorithm.

INTRODUCTION
RELATED WORK
Optimization-based Motion Retargeting
Data-driven Motion Retargeting
METHODOLOGY
Overview
Data Pairing
Two-Stage Motion Retargeting Network
EXPERIMENTS
Dataset
Evaluation Metrics
Motion Retargeting Network Training Setup
Comparison to the Unsupervised Learning Method
Ablation Study: Pose Filtering and Network Architecture
Motion Retargeting in Real-world
...and 1 more sections

Figures (5)

Figure 1: Visualization of how our data sampling method works. It randomly samples a robot pose (left) and converts it to the respective human pose (middle). Then, the obtained pose is reconstructed to a pose without noise (right). When random robot pose sampling results in extreme poses (below), we found the difference between the original and denoised poses becomes larger. Based on this, we collect only feasible poses (top) whose differences with denoised poses are small.
Figure 2: Overview of our data sampling process and extreme filtering component. (a) A robot pose $\mathbf{q}$ is sampled from the valid min-max joint angle range, (b) converted to the XYZ position $P$ and (6D) orientation $R$ by using forward kinematics. (c) The inverse kinematics solver $VPoser_{IK}$ uses $P$ to obtain (6D) SMPL human pose parameters $H$. Then, (d) by passing $H$ to the encoder and decoder of VPoser, its denoised version $\tilde{H}$ is obtained. Finally, (e) we measure the mean square error (MSE) between $H$ and $\tilde{H}$ and detect noisy poses. We discard the data if the MSE error is larger than a certain threshold.
Figure 3: The proposed supervised two-stage motion retargeting pipeline. An RGB image $I_t$ of a human pose from a video is converted to SMPL pose parameters using a mesh recovery network. These, (6D) SMPL pose parameters $H_t$ are input to the pre network $\mathcal{F}_{pre}$ which converts them to the corresponding robot pose (6D) orientation $R_t$. Then, $R_t$ is input to the post network $\mathcal{F}_{post}$ which maps it to the corresponding robot joint angles $\mathbf{q}_t$.
Figure 4: Visualization showcasing the performance of our method in determining XYZ positions of links relative to the baseline on evaluation set poses.
Figure 5: Real-time retargeting of human poses from RGB images onto Reachy in a real world environment.

Redefining Data Pairing for Motion Retargeting Leveraging a Human Body Prior

TL;DR

Abstract

Redefining Data Pairing for Motion Retargeting Leveraging a Human Body Prior

Authors

TL;DR

Abstract

Table of Contents

Figures (5)