Table of Contents
Fetching ...

MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond

Shenghao Ren, Yi Lu, Jiayi Huang, Jiayi Zhao, He Zhang, Tao Yu, Qiu Shen, Xun Cao

TL;DR

This work addresses the gap between geometry-focused human MoCap and physical interaction by leveraging whole-body pressure data. It introduces MotionPRO, a large-scale multimodal dataset with Pressure, RGB, and Optical sensors (70 volunteers, 400 motions, 12.4M frames), and FRAPPE, a baseline that fuses pressure and RGB for accurate pose and globally plausible trajectory estimation. Two evaluation tasks show that pressure alone yields accurate global translation and plausible lower-body pose, while fusion with RGB via cross-attention and orthographic constraints improves global pose and trajectory under occlusions. Extended experiments demonstrate improved humanoid robot actuation and stability, highlighting practical impact for embodied AI.

Abstract

Existing human Motion Capture (MoCap) methods mostly focus on the visual similarity while neglecting the physical plausibility. As a result, downstream tasks such as driving virtual human in 3D scene or humanoid robots in real world suffer from issues such as timing drift and jitter, spatial problems like sliding and penetration, and poor global trajectory accuracy. In this paper, we revisit human MoCap from the perspective of interaction between human body and physical world by exploring the role of pressure. Firstly, we construct a large-scale human Motion capture dataset with Pressure, RGB and Optical sensors (named MotionPRO), which comprises 70 volunteers performing 400 types of motion, encompassing a total of 12.4M pose frames. Secondly, we examine both the necessity and effectiveness of the pressure signal through two challenging tasks: (1) pose and trajectory estimation based solely on pressure: We propose a network that incorporates a small kernel decoder and a long-short-term attention module, and proof that pressure could provide accurate global trajectory and plausible lower body pose. (2) pose and trajectory estimation by fusing pressure and RGB: We impose constraints on orthographic similarity along the camera axis and whole-body contact along the vertical axis to enhance the cross-attention strategy to fuse pressure and RGB feature maps. Experiments demonstrate that fusing pressure with RGB features not only significantly improves performance in terms of objective metrics, but also plausibly drives virtual humans (SMPL) in 3D scene. Furthermore, we demonstrate that incorporating physical perception enables humanoid robots to perform more precise and stable actions, which is highly beneficial for the development of embodied artificial intelligence. Project page is available at: https://nju-cite-mocaphumanoid.github.io/MotionPRO/

MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond

TL;DR

This work addresses the gap between geometry-focused human MoCap and physical interaction by leveraging whole-body pressure data. It introduces MotionPRO, a large-scale multimodal dataset with Pressure, RGB, and Optical sensors (70 volunteers, 400 motions, 12.4M frames), and FRAPPE, a baseline that fuses pressure and RGB for accurate pose and globally plausible trajectory estimation. Two evaluation tasks show that pressure alone yields accurate global translation and plausible lower-body pose, while fusion with RGB via cross-attention and orthographic constraints improves global pose and trajectory under occlusions. Extended experiments demonstrate improved humanoid robot actuation and stability, highlighting practical impact for embodied AI.

Abstract

Existing human Motion Capture (MoCap) methods mostly focus on the visual similarity while neglecting the physical plausibility. As a result, downstream tasks such as driving virtual human in 3D scene or humanoid robots in real world suffer from issues such as timing drift and jitter, spatial problems like sliding and penetration, and poor global trajectory accuracy. In this paper, we revisit human MoCap from the perspective of interaction between human body and physical world by exploring the role of pressure. Firstly, we construct a large-scale human Motion capture dataset with Pressure, RGB and Optical sensors (named MotionPRO), which comprises 70 volunteers performing 400 types of motion, encompassing a total of 12.4M pose frames. Secondly, we examine both the necessity and effectiveness of the pressure signal through two challenging tasks: (1) pose and trajectory estimation based solely on pressure: We propose a network that incorporates a small kernel decoder and a long-short-term attention module, and proof that pressure could provide accurate global trajectory and plausible lower body pose. (2) pose and trajectory estimation by fusing pressure and RGB: We impose constraints on orthographic similarity along the camera axis and whole-body contact along the vertical axis to enhance the cross-attention strategy to fuse pressure and RGB feature maps. Experiments demonstrate that fusing pressure with RGB features not only significantly improves performance in terms of objective metrics, but also plausibly drives virtual humans (SMPL) in 3D scene. Furthermore, we demonstrate that incorporating physical perception enables humanoid robots to perform more precise and stable actions, which is highly beneficial for the development of embodied artificial intelligence. Project page is available at: https://nju-cite-mocaphumanoid.github.io/MotionPRO/

Paper Structure

This paper contains 22 sections, 8 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: MotionPRO is a large-scale human Motion capture dataset with Pressure, RGB and Optical sensors, which comprises 70 volunteers performing 400 types of motion, encompassing a total of 12.4M pose frames.
  • Figure 2: The architecture of our motion capture system for dataset collection.
  • Figure 3: Hierarchal distribution of 400 motion types.
  • Figure 4: Pose and Trajectory estimation using only pressure.
  • Figure 5: The framework of FRAPPE which fuses pressure and RGB for global pose and trajectory estimation.
  • ...and 11 more figures