Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective

Feiyu Yao; Zongkai Wu; Li Yi

Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective

Feiyu Yao, Zongkai Wu, Li Yi

TL;DR

This work tackles full-body motion reconstruction from sparse VR sensing by introducing a Body Pose Graph (BPG) that completes missing joints through graph-based reasoning. It fuses translation and rotation cues using a Temporal Pyramid Structure and separates trunk versus limb spatial features to initialize a graph of twenty-two joints, then refines these features with a Graph Convolution Network featuring expressive edges learned from static, dynamic, and latent relations. The approach achieves state-of-the-art results on multiple datasets, notably improving lower-body accuracy, and ablation studies confirm the necessity of each component, including temporal, spatial, and symmetric constraints. The proposed method offers a practical, scalable solution for realistic avatar rendering in AR/VR using widely available sparse sensors, with strong implications for real-time full-body reconstruction.

Abstract

Estimating 3D full-body pose from sparse sensor data is a pivotal technique employed for the reconstruction of realistic human motions in Augmented Reality and Virtual Reality. However, translating sparse sensor signals into comprehensive human motion remains a challenge since the sparsely distributed sensors in common VR systems fail to capture the motion of full human body. In this paper, we use well-designed Body Pose Graph (BPG) to represent the human body and translate the challenge into a prediction problem of graph missing nodes. Then, we propose a novel full-body motion reconstruction framework based on BPG. To establish BPG, nodes are initially endowed with features extracted from sparse sensor signals. Features from identifiable joint nodes across diverse sensors are amalgamated and processed from both temporal and spatial perspectives. Temporal dynamics are captured using the Temporal Pyramid Structure, while spatial relations in joint movements inform the spatial attributes. The resultant features serve as the foundational elements of the BPG nodes. To further refine the BPG, node features are updated through a graph neural network that incorporates edge reflecting varying joint relations. Our method's effectiveness is evidenced by the attained state-of-the-art performance, particularly in lower body motion, outperforming other baseline methods. Additionally, an ablation study validates the efficacy of each module in our proposed framework.

Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective

TL;DR

Abstract

Paper Structure (18 sections, 12 equations, 5 figures, 4 tables)

This paper contains 18 sections, 12 equations, 5 figures, 4 tables.

Introduction
Related Work
Full-Body Motion Reconstruction From Sparse Inputs
Graph Neural Networks
Graph Neural Networks in Human Pose Estimation
Methods
Problem Formulation
Node Feature Initialization
Node Feature Updating
Training and Loss
Experiemnt
Data Preparation and Evaluation Metrics
Performance Comparison With Baseline Method
Performance Comparison With Offline Method
Ablation Study
...and 3 more sections

Figures (5)

Figure 1: Illustration of our proposed structure. Inputs are sparse sensor position and rotational signals from VR system. Feature Integration module integrates position feature and rotation feature with different physical properties with interactive learning. In Node Property Generation module, motion temporal property is achieved through Temporal Pyramid module. To gain motion spatial property, the limb motion feature is composed by trunk motion features and limb local motion features. The trunk and limb feature then serve as initial node features in Body Pose Graph. In Node Feature Updating, graph convolution network with different edges modeling different joint relations is applied to update nodes.
Figure 2: Temporal Pyramid Structure
Figure 3: Index of human body joints
Figure 4: Visualization of estimated poses on an avatar involves a series of frames portraying a human front kick action. It encompasses three rows: the top row showcases avatars with ground truth (GT) poses, while the subsequent two rows display avatars generated by our approach and AvatarPoser. These avatars are color-coded to denote errors in each mesh.
Figure 5: Left diagram depicts a 0-1 adjacency matrix representation of the skeletal connectivity within the human body. Conversely, right diagram showcases an adjacency matrix generated by the GCN with expressive edges. The deeper the color, the stronger the relationship between the nodes. Red indicates positive correlation, while blue indicates negative correlation.

Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective

TL;DR

Abstract

Full-Body Motion Reconstruction with Sparse Sensing from Graph Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (5)