End-to-End Human Pose Reconstruction from Wearable Sensors for 6G Extended Reality Systems
Nguyen Quang Hieu, Dinh Thai Hoang, Diep N. Nguyen, Mohammad Abu Alsheikh, Carlos C. N. Kuhn, Yibeltal F. Alem, Ibrahim Radwan
TL;DR
The paper addresses robust 3D human pose reconstruction for XR over 6G by transmitting IMU data through an OFDM uplink and receiving it with a two-stage deep learning system. The neural receiver jointly estimates the wireless channel and decodes OFDM symbols, while a subsequent IMU mapper converts decoded signals to SMPL pose parameters, enabling end-to-end pose reconstruction under realistic channel impairments and quantization. Results show a BER improvement of about $5$ dB at a BER of $10^{-4}$ compared to LS-LMMSE baselines, and demonstrate that $q=8$ bits of IMU quantization suffice to achieve high-fidelity poses with MSE around $5\times10^{-4}$ and MPJAE below $0.01^{\circ}$ for $q\geq7$, validated on ray-traced site-specific channels. The work highlights the viability of seamless XR experiences over 6G by integrating channel estimation, symbol decoding, and pose synthesis in an end-to-end framework, and provides a dataset created with ray-traced OFDM channels to support reproducibility and further research.
Abstract
Full 3D human pose reconstruction is a critical enabler for extended reality (XR) applications in future sixth generation (6G) networks, supporting immersive interactions in gaming, virtual meetings, and remote collaboration. However, achieving accurate pose reconstruction over wireless networks remains challenging due to channel impairments, bit errors, and quantization effects. Existing approaches often assume error-free transmission in indoor settings, limiting their applicability to real-world scenarios. To address these challenges, we propose a novel deep learning-based framework for human pose reconstruction over orthogonal frequency-division multiplexing (OFDM) systems. The framework introduces a two-stage deep learning receiver: the first stage jointly estimates the wireless channel and decodes OFDM symbols, and the second stage maps the received sensor signals to full 3D body poses. Simulation results demonstrate that the proposed neural receiver reduces bit error rate (BER), thus gaining a 5 dB gap at $10^{-4}$ BER, compared to the baseline method that employs separate signal detection steps, i.e., least squares channel estimation and linear minimum mean square error equalization. Additionally, our empirical findings show that 8-bit quantization is sufficient for accurate pose reconstruction, achieving a mean squared error of $5\times10^{-4}$ for reconstructed sensor signals, and reducing joint angular error by 37\% for the reconstructed human poses compared to the baseline.
