Table of Contents
Fetching ...

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Peng Dai, Yang Zhang, Tao Liu, Zhen Fan, Tianyuan Du, Zhuo Su, Xiaozheng Zheng, Zeming Li

TL;DR

HMD-Poser is proposed, the first unified approach to recover full-body motions using scalable sparse observations from HMD and body-worn 1MUs and achieves new state-of-the-art results in both accuracy and real-time performance.

Abstract

It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO. In this paper, we propose HMD-Poser, the first unified approach to recover full-body motions using scalable sparse observations from HMD and body-worn IMUs. In particular, it can support a variety of input scenarios, such as HMD, HMD+2IMUs, HMD+3IMUs, etc. The scalability of inputs may accommodate users' choices for both high tracking accuracy and easy-to-wear. A lightweight temporal-spatial feature learning network is proposed in HMD-Poser to guarantee that the model runs in real-time on HMDs. Furthermore, HMD-Poser presents online body shape estimation to improve the position accuracy of body joints. Extensive experimental results on the challenging AMASS dataset show that HMD-Poser achieves new state-of-the-art results in both accuracy and real-time performance. We also build a new free-dancing motion dataset to evaluate HMD-Poser's on-device performance and investigate the performance gap between synthetic data and real-captured sensor data. Finally, we demonstrate our HMD-Poser with a real-time Avatar-driving application on a commercial HMD. Our code and free-dancing motion dataset are available https://pico-ai-team.github.io/hmd-poser

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

TL;DR

HMD-Poser is proposed, the first unified approach to recover full-body motions using scalable sparse observations from HMD and body-worn 1MUs and achieves new state-of-the-art results in both accuracy and real-time performance.

Abstract

It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO. In this paper, we propose HMD-Poser, the first unified approach to recover full-body motions using scalable sparse observations from HMD and body-worn IMUs. In particular, it can support a variety of input scenarios, such as HMD, HMD+2IMUs, HMD+3IMUs, etc. The scalability of inputs may accommodate users' choices for both high tracking accuracy and easy-to-wear. A lightweight temporal-spatial feature learning network is proposed in HMD-Poser to guarantee that the model runs in real-time on HMDs. Furthermore, HMD-Poser presents online body shape estimation to improve the position accuracy of body joints. Extensive experimental results on the challenging AMASS dataset show that HMD-Poser achieves new state-of-the-art results in both accuracy and real-time performance. We also build a new free-dancing motion dataset to evaluate HMD-Poser's on-device performance and investigate the performance gap between synthetic data and real-captured sensor data. Finally, we demonstrate our HMD-Poser with a real-time Avatar-driving application on a commercial HMD. Our code and free-dancing motion dataset are available https://pico-ai-team.github.io/hmd-poser
Paper Structure (19 sections, 3 equations, 5 figures, 6 tables)

This paper contains 19 sections, 3 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: HMD-Poser can handle scalable input scenarios, including (a) HMD, (b) HMD+2IMUs wherein two IMUs are worn on the lower legs, (c) HMD+3IMUs wherein a third IMU is added to the pelvis, etc. HMD-Poser runs on HMD and outputs full-body motion data to drive an Avatar in real-time.
  • Figure 2: Overview of HMD-Poser. At each time step $t$, each component in the input data $x^t$ (see Eq. \ref{['eq:input_feat']}) is firstly mapped to a higher-dimensional embedding feature $f^t$ via the feature embedding module. Then, a lightweight temporal-spatial feature learning network is adopted to generate representations with rich temporal and spatial correlation information. Next, two regression heads regress the local pose parameters $\theta^t$ and the shape parameters $\beta^t$ of SMPL, respectively. Finally, a forward-kinematics (FK) module is adopted to calculate the global poses and positions of all joints which are used to drive an Avatar in real-time.
  • Figure 3: Qualitative comparisons between our method and state-of-the-art methods in HMD setting. When comparing with methods in this category, HMD-Poser uses the HMD input scenario for a fair comparison.
  • Figure 4: Qualitative comparisons between our method and 6IMUs-based methods. For a fair comparison, we provide head and hand positions to the baselines and compare them with our method under the HMD+3IMUs input scenario.
  • Figure 5: Results of real-time Avatar-driving on PICO 4 HMD.