Table of Contents
Fetching ...

MobilePoser: Real-Time Full-Body Pose Estimation and 3D Human Translation from IMUs in Mobile Consumer Devices

Vasco Xu, Chenfeng Gao, Henry Hoffmann, Karan Ahuja

TL;DR

This work introduces MobilePoser, a real-time system for full-body pose and global translation estimation using any available subset of IMUs already present in these consumer devices, and employs a multi-stage deep neural network for kinematic pose estimation followed by a physics-based motion optimizer, achieving state-of-the-art accuracy while remaining lightweight.

Abstract

There has been a continued trend towards minimizing instrumentation for full-body motion capture, going from specialized rooms and equipment, to arrays of worn sensors and recently sparse inertial pose capture methods. However, as these techniques migrate towards lower-fidelity IMUs on ubiquitous commodity devices, like phones, watches, and earbuds, challenges arise including compromised online performance, temporal consistency, and loss of global translation due to sensor noise and drift. Addressing these challenges, we introduce MobilePoser, a real-time system for full-body pose and global translation estimation using any available subset of IMUs already present in these consumer devices. MobilePoser employs a multi-stage deep neural network for kinematic pose estimation followed by a physics-based motion optimizer, achieving state-of-the-art accuracy while remaining lightweight. We conclude with a series of demonstrative applications to illustrate the unique potential of MobilePoser across a variety of fields, such as health and wellness, gaming, and indoor navigation to name a few.

MobilePoser: Real-Time Full-Body Pose Estimation and 3D Human Translation from IMUs in Mobile Consumer Devices

TL;DR

This work introduces MobilePoser, a real-time system for full-body pose and global translation estimation using any available subset of IMUs already present in these consumer devices, and employs a multi-stage deep neural network for kinematic pose estimation followed by a physics-based motion optimizer, achieving state-of-the-art accuracy while remaining lightweight.

Abstract

There has been a continued trend towards minimizing instrumentation for full-body motion capture, going from specialized rooms and equipment, to arrays of worn sensors and recently sparse inertial pose capture methods. However, as these techniques migrate towards lower-fidelity IMUs on ubiquitous commodity devices, like phones, watches, and earbuds, challenges arise including compromised online performance, temporal consistency, and loss of global translation due to sensor noise and drift. Addressing these challenges, we introduce MobilePoser, a real-time system for full-body pose and global translation estimation using any available subset of IMUs already present in these consumer devices. MobilePoser employs a multi-stage deep neural network for kinematic pose estimation followed by a physics-based motion optimizer, achieving state-of-the-art accuracy while remaining lightweight. We conclude with a series of demonstrative applications to illustrate the unique potential of MobilePoser across a variety of fields, such as health and wellness, gaming, and indoor navigation to name a few.

Paper Structure

This paper contains 40 sections, 3 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Real-time global pose estimation powered by MobilePoser: (A) Person with smartwatch (left wrist) waving their hands. (B) Person with smartwatch (left wrist) performing jumping jacks. (C) Person wearing a smartwatch (left wrist) and carrying a phone in their right pocket running.
  • Figure 2: MobilePoser system overview. MobilePoser accepts any available subset of IMU data from the user and masks absent devices by setting their values to zero. The IMU data is then fed into two main modules: (1) Pose Estimation, which first estimates joint positions followed by joint rotations, and (2) Translation Estimation, which combines foot-ground contact probabilities with a direct neural network-based approach to regress global velocity. Finally, a Physics Optimizer refines the predicted joint rotations and global translation to ensure they satisfy physical constraints.
  • Figure 3: Demonstration of the physics optimizer's ability to reduce foot-ground penetration.
  • Figure 4: Comparison of MobilePoser's Full-Body Pose Estimation Error across different Evaluation Protocols on the DIP-IMU, IMUPoser and TotalCapture dataset respectively.
  • Figure 5: Qualitative comparisons between our method and IMUPoser on the DIP-IMU and IMUPoser dataset.
  • ...and 6 more figures