Table of Contents
Fetching ...

ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening

Hojun Jang, Young Min Kim

TL;DR

Reusable Motion prior (ReMP), an effective motion prior that can accurately track the temporal evolution of motion in various downstream tasks, consistently outperforms the baseline method on diverse and practical 3D motion data, including depth point clouds, LiDAR scans, and IMU sensor data.

Abstract

We present Reusable Motion prior (ReMP), an effective motion prior that can accurately track the temporal evolution of motion in various downstream tasks. Inspired by the success of foundation models, we argue that a robust spatio-temporal motion prior can encapsulate underlying 3D dynamics applicable to various sensor modalities. We learn the rich motion prior from a sequence of complete parametric models of posed human body shape. Our prior can easily estimate poses in missing frames or noisy measurements despite significant occlusion by employing a temporal attention mechanism. More interestingly, our prior can guide the system with incomplete and challenging input measurements to quickly extract critical information to estimate the sequence of poses, significantly improving the training efficiency for mesh sequence recovery. ReMP consistently outperforms the baseline method on diverse and practical 3D motion data, including depth point clouds, LiDAR scans, and IMU sensor data. Project page is available in https://hojunjang17.github.io/ReMP.

ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening

TL;DR

Reusable Motion prior (ReMP), an effective motion prior that can accurately track the temporal evolution of motion in various downstream tasks, consistently outperforms the baseline method on diverse and practical 3D motion data, including depth point clouds, LiDAR scans, and IMU sensor data.

Abstract

We present Reusable Motion prior (ReMP), an effective motion prior that can accurately track the temporal evolution of motion in various downstream tasks. Inspired by the success of foundation models, we argue that a robust spatio-temporal motion prior can encapsulate underlying 3D dynamics applicable to various sensor modalities. We learn the rich motion prior from a sequence of complete parametric models of posed human body shape. Our prior can easily estimate poses in missing frames or noisy measurements despite significant occlusion by employing a temporal attention mechanism. More interestingly, our prior can guide the system with incomplete and challenging input measurements to quickly extract critical information to estimate the sequence of poses, significantly improving the training efficiency for mesh sequence recovery. ReMP consistently outperforms the baseline method on diverse and practical 3D motion data, including depth point clouds, LiDAR scans, and IMU sensor data. Project page is available in https://hojunjang17.github.io/ReMP.

Paper Structure

This paper contains 20 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: We extract rich motion priors from the large-scale motion dataset and reuse them for various applications, such as 3D human pose estimation and motion inbetweening.
  • Figure 2: The overall pipeline of our method consists of two parts: (a) training motion prior and (b) reusing pretrained prior. In the motion prior training phase, a sequence of pose parameters $\theta_{1:T}$ and the root translation transitions $\Delta x_{1:T}$ form a sequence of motion parameter $M_{1:T}$. We use a transformer encoder and MLP layers to generate Gaussian distributions where we can sample the latent vectors. We feed the latent vectors to a transformer decoder to generate the motion parameters then to the SMPL parameters. After training the prior, we freeze all the networks used in the first phase. In the reusing phase, we encode the input data and use a transformer encoder to generate a distribution that is then used to sample the latent vectors for the transformer decoder. We use an additional shape parameter estimator for $\beta$. Finally, we combine all three parameters with the SMPL layer to reconstruct the human motion.
  • Figure 3: Results of ReMP and the baselines on synthetic CMU CMU depth point cloud data. The colors on the mesh indicate the displacement from the ground truth vertices.
  • Figure 4: Pose estimation results of ReMP and the baselines on SLOPER4D dataset sloper4d. The colors on the mesh indicate the displacement from the ground truth vertices.
  • Figure 5: Motion reconstruction results of ReMP and the baselines from IMU sensor data on TotalCapture dataset TotalCapture. The colors on the mesh indicate the displacement from the ground truth vertices.
  • ...and 2 more figures