Table of Contents
Fetching ...

EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling

Songpengcheng Xia, Yu Zhang, Zhuo Su, Xiaozheng Zheng, Zheng Lv, Guidong Wang, Yongjie Zhang, Qi Wu, Lei Chu, Ling Pei

TL;DR

EnvPoser addresses full-body motion estimation from sparse HMD/hand signals by introducing a two-stage, environment-aware framework that explicitly models multi-hypothesis uncertainty and refines results with semantic and geometric scene constraints. Stage I produces uncertainty-aware initial estimates to capture multiple plausible motions, while Stage II leverages a cropped environmental point cloud and cross-attention to enforce realistic interactions and prevent collisions. The approach achieves state-of-the-art results on EgoBody and GIMO, reducing both angular and positional errors and improving motion smoothness, particularly in environments with frequent interactions. By incorporating environmental context and uncertainty quantification, EnvPoser offers robust, realistic motion estimation for VR/AR applications and lays groundwork for dynamic-scene extensions and richer sensory integration.

Abstract

Estimating full-body motion using the tracking signals of head and hands from VR devices holds great potential for various applications. However, the sparsity and unique distribution of observations present a significant challenge, resulting in an ill-posed problem with multiple feasible solutions (i.e., hypotheses). This amplifies uncertainty and ambiguity in full-body motion estimation, especially for the lower-body joints. Therefore, we propose a new method, EnvPoser, that employs a two-stage framework to perform full-body motion estimation using sparse tracking signals and pre-scanned environment from VR devices. EnvPoser models the multi-hypothesis nature of human motion through an uncertainty-aware estimation module in the first stage. In the second stage, we refine these multi-hypothesis estimates by integrating semantic and geometric environmental constraints, ensuring that the final motion estimation aligns realistically with both the environmental context and physical interactions. Qualitative and quantitative experiments on two public datasets demonstrate that our method achieves state-of-the-art performance, highlighting significant improvements in human motion estimation within motion-environment interaction scenarios.

EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling

TL;DR

EnvPoser addresses full-body motion estimation from sparse HMD/hand signals by introducing a two-stage, environment-aware framework that explicitly models multi-hypothesis uncertainty and refines results with semantic and geometric scene constraints. Stage I produces uncertainty-aware initial estimates to capture multiple plausible motions, while Stage II leverages a cropped environmental point cloud and cross-attention to enforce realistic interactions and prevent collisions. The approach achieves state-of-the-art results on EgoBody and GIMO, reducing both angular and positional errors and improving motion smoothness, particularly in environments with frequent interactions. By incorporating environmental context and uncertainty quantification, EnvPoser offers robust, realistic motion estimation for VR/AR applications and lays groundwork for dynamic-scene extensions and richer sensory integration.

Abstract

Estimating full-body motion using the tracking signals of head and hands from VR devices holds great potential for various applications. However, the sparsity and unique distribution of observations present a significant challenge, resulting in an ill-posed problem with multiple feasible solutions (i.e., hypotheses). This amplifies uncertainty and ambiguity in full-body motion estimation, especially for the lower-body joints. Therefore, we propose a new method, EnvPoser, that employs a two-stage framework to perform full-body motion estimation using sparse tracking signals and pre-scanned environment from VR devices. EnvPoser models the multi-hypothesis nature of human motion through an uncertainty-aware estimation module in the first stage. In the second stage, we refine these multi-hypothesis estimates by integrating semantic and geometric environmental constraints, ensuring that the final motion estimation aligns realistically with both the environmental context and physical interactions. Qualitative and quantitative experiments on two public datasets demonstrate that our method achieves state-of-the-art performance, highlighting significant improvements in human motion estimation within motion-environment interaction scenarios.

Paper Structure

This paper contains 18 sections, 12 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: EnvPoser can estimate the full-body motion using three tracking signals (HMD and hand controllers) and a pre-scanned environment mesh.
  • Figure 1: Environmental point cloud with different sampling strategies.
  • Figure 2: Overview of EnvPoser: A Two-Stage Motion Estimation Model. Stage I involves training the uncertainty-aware initial estimation module on the AMASS dataset to produce initial motion estimates with uncertainty quantification. Stage II refines these estimates by training on motion-environment datasets, incorporating semantic and geometric environmental constraints.
  • Figure 2: Qualitative results of lower-body MPJPE box plot for ablation study on GIMO dataset.
  • Figure 3: Visualization of motion estimation on three test sequences from EgoBody Dataset zhang2022egobody.
  • ...and 4 more figures