Table of Contents
Fetching ...

Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors

Ryosuke Hori, Jyun-Ting Song, Zhengyi Luo, Jinkun Cao, Soyong Shin, Hideo Saito, Kris Kitani

Abstract

We propose Ground Reaction Inertial Poser (GRIP), a method that reconstructs physically plausible human motion using four wearable devices. Unlike conventional IMU-only approaches, GRIP combines IMU signals with foot pressure data to capture both body dynamics and ground interactions. Furthermore, rather than relying solely on kinematic estimation, GRIP uses a digital twin of a person, in the form of a synthetic humanoid in a physics simulator, to reconstruct realistic and physically plausible motion. At its core, GRIP consists of two modules: KinematicsNet, which estimates body poses and velocities from sensor data, and DynamicsNet, which controls the humanoid in the simulator using the residual between the KinematicsNet prediction and the simulated humanoid state. To enable robust training and fair evaluation, we introduce a large-scale dataset, Pressure and Inertial Sensing for Human Motion and Interaction (PRISM), that captures diverse human motions with synchronized IMUs and insole pressure sensors. Experimental results show that GRIP outperforms existing IMU-only and IMU-pressure fusion methods across all evaluated datasets, achieving higher global pose accuracy and improved physical consistency.

Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors

Abstract

We propose Ground Reaction Inertial Poser (GRIP), a method that reconstructs physically plausible human motion using four wearable devices. Unlike conventional IMU-only approaches, GRIP combines IMU signals with foot pressure data to capture both body dynamics and ground interactions. Furthermore, rather than relying solely on kinematic estimation, GRIP uses a digital twin of a person, in the form of a synthetic humanoid in a physics simulator, to reconstruct realistic and physically plausible motion. At its core, GRIP consists of two modules: KinematicsNet, which estimates body poses and velocities from sensor data, and DynamicsNet, which controls the humanoid in the simulator using the residual between the KinematicsNet prediction and the simulated humanoid state. To enable robust training and fair evaluation, we introduce a large-scale dataset, Pressure and Inertial Sensing for Human Motion and Interaction (PRISM), that captures diverse human motions with synchronized IMUs and insole pressure sensors. Experimental results show that GRIP outperforms existing IMU-only and IMU-pressure fusion methods across all evaluated datasets, achieving higher global pose accuracy and improved physical consistency.
Paper Structure (23 sections, 9 equations, 10 figures, 5 tables)

This paper contains 23 sections, 9 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overview of the proposed Ground Reaction Inertial Poser (GRIP). (a) GRIP observes motion using four IMUs and foot pressure data from smartwatches and smart insoles. (b) Full-body motion is reconstructed by driving a humanoid with joint torques in a physics simulator. (c) The PRISM dataset offers multimodal measurements, including IMUs, foot pressure, motion data, and environmental data.
  • Figure 2: Overview of the GRIP framework. Input Data (Sec. \ref{['subsec:input_data']}) consists of IMU and insole measurements. KinematicsNet (Sec. \ref{['subsec:kin_net']}) estimates kinematic states, and the State Difference (Sec. \ref{['subsec:state_diff']}) compares them with the simulated humanoid. DynamicsNet (Sec. \ref{['subsec:dyn_net']}) drives the humanoid through physics simulation-based control. The PRISM dataset (Sec. \ref{['subsec:prism']}) provides diverse multi-modal data.
  • Figure 3: Qualitative comparison of pose estimation results across the three datasets. Our method accurately reconstructs foot placement on objects (PRISM), exhibits less position drift (UnderPressure), and captures slow weight-shifting motions (PSU-TMM100).
  • Figure 4: Comparison of foot contact timing. Right-foot contact labels during low-speed motions in PSU-TMM100, computed from the estimated GRF of the physics-based methods.
  • Figure 5: Qualitative comparison of estimated poses and root trajectories for a walking sequence from the UnderPressure dataset. Colors correspond to the same methods shown in Fig. \ref{['fig:compare1']}.
  • ...and 5 more figures