Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

Yu Zhang; Songpengcheng Xia; Lei Chu; Jiarui Yang; Qi Wu; Ling Pei

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

Yu Zhang, Songpengcheng Xia, Lei Chu, Jiarui Yang, Qi Wu, Ling Pei

TL;DR

DynaIP tackles sparse IMU-based human pose estimation by unifying real inertial mocap data across skeleton formats with a global orientation mapping, and by learning dynamics through a two-stage architecture that first regresses pseudo-velocity and then estimates full-body pose. It further enforces robustness with a part-based model that splits the body into three regions (upper limbs, torso, lower limbs) and fuses local region outputs with a global context. The combination of unified real data, velocity-informed dynamics, and region-aware modeling yields state-of-the-art performance across five public datasets, notably reducing DIP-IMU pose error by up to $19\%$, and demonstrates strong generalization to diverse motions. The work highlights practical implications for real-time pose capture with few sensors and points toward future directions in data diversity, global coherence, and integration with consumer devices and additional sensor modalities.

Abstract

This paper introduces a novel human pose estimation approach using sparse inertial sensors, addressing the shortcomings of previous methods reliant on synthetic data. It leverages a diverse array of real inertial motion capture data from different skeleton formats to improve motion diversity and model generalization. This method features two innovative components: a pseudo-velocity regression model for dynamic motion capture with inertial sensors, and a part-based model dividing the body and sensor data into three regions, each focusing on their unique characteristics. The approach demonstrates superior performance over state-of-the-art models across five public datasets, notably reducing pose error by 19\% on the DIP-IMU dataset, thus representing a significant improvement in inertial sensor-based human pose estimation. Our codes are available at {\url{https://github.com/dx118/dynaip}}.

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

TL;DR

, and demonstrates strong generalization to diverse motions. The work highlights practical implications for real-time pose capture with few sensors and points toward future directions in data diversity, global coherence, and integration with consumer devices and additional sensor modalities.

Abstract

Paper Structure (18 sections, 8 equations, 12 figures, 8 tables)

This paper contains 18 sections, 8 equations, 12 figures, 8 tables.

Introduction
Related Work
Method
Training Data Unified across Skeleton Formats
Two-stage Human pose estimation with Pseudo Velocity Regression
Stage I: pseudo velocity regression
Stage II: human pose estimation
Learning Part-based 3D Human Dynamics with Three Local Body Regions
Experiments
Impact of the Unified Inertial Mocap Data and Virtual-to-Real Training Scheme
Overall Performance Comparison on the Unified Inertial Mocap Data
Ablations on the Components of our Model
Conclusions
Implementation Details
Datasets Details
...and 3 more sections

Figures (12)

Figure 1: Our innovative data-driven approach for robust full-body pose estimation using six IMUs: unifying inertial mocap datasets across skeleton formats and enhancing challenging motion capture with local body region modeling and pseudo-velocity estimation.
Figure 2: Overview of our proposed method, a part-based human pose estimation model with pseudo-velocity regression. Our model incorporates a two-stage structure. The first stage predicts joint velocities using IMU measurements, while the second stage focuses on predicting the entire body's joints rotation. Additionally, we partition the human body and the attached IMU sensors into three local regions. These regions are input into our proposed part-based human pose estimation model, designed to estimate each local region's pose while maintaining global coherency. This multi-stage and part-based approach enhances the accuracy and consistency of our pose estimation.
Figure 3: The evaluation of the end-effector velocity across various motions. With the mapping of joint's global orientations from Xsens to SMPL, there is no significant discrepancy in the end-effector velocities.
Figure 4: Qualitative comparisons on DIP-IMU huang2018deep test set.
Figure 5: Qualitative results of SIP error box plot for three competing methods and DynaIP on Natural Motion geissinger2020motion dataset.
...and 7 more figures

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

TL;DR

Abstract

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

Authors

TL;DR

Abstract

Table of Contents

Figures (12)