Table of Contents
Fetching ...

End-to-End Human Pose Reconstruction from Wearable Sensors for 6G Extended Reality Systems

Nguyen Quang Hieu, Dinh Thai Hoang, Diep N. Nguyen, Mohammad Abu Alsheikh, Carlos C. N. Kuhn, Yibeltal F. Alem, Ibrahim Radwan

TL;DR

The paper addresses robust 3D human pose reconstruction for XR over 6G by transmitting IMU data through an OFDM uplink and receiving it with a two-stage deep learning system. The neural receiver jointly estimates the wireless channel and decodes OFDM symbols, while a subsequent IMU mapper converts decoded signals to SMPL pose parameters, enabling end-to-end pose reconstruction under realistic channel impairments and quantization. Results show a BER improvement of about $5$ dB at a BER of $10^{-4}$ compared to LS-LMMSE baselines, and demonstrate that $q=8$ bits of IMU quantization suffice to achieve high-fidelity poses with MSE around $5\times10^{-4}$ and MPJAE below $0.01^{\circ}$ for $q\geq7$, validated on ray-traced site-specific channels. The work highlights the viability of seamless XR experiences over 6G by integrating channel estimation, symbol decoding, and pose synthesis in an end-to-end framework, and provides a dataset created with ray-traced OFDM channels to support reproducibility and further research.

Abstract

Full 3D human pose reconstruction is a critical enabler for extended reality (XR) applications in future sixth generation (6G) networks, supporting immersive interactions in gaming, virtual meetings, and remote collaboration. However, achieving accurate pose reconstruction over wireless networks remains challenging due to channel impairments, bit errors, and quantization effects. Existing approaches often assume error-free transmission in indoor settings, limiting their applicability to real-world scenarios. To address these challenges, we propose a novel deep learning-based framework for human pose reconstruction over orthogonal frequency-division multiplexing (OFDM) systems. The framework introduces a two-stage deep learning receiver: the first stage jointly estimates the wireless channel and decodes OFDM symbols, and the second stage maps the received sensor signals to full 3D body poses. Simulation results demonstrate that the proposed neural receiver reduces bit error rate (BER), thus gaining a 5 dB gap at $10^{-4}$ BER, compared to the baseline method that employs separate signal detection steps, i.e., least squares channel estimation and linear minimum mean square error equalization. Additionally, our empirical findings show that 8-bit quantization is sufficient for accurate pose reconstruction, achieving a mean squared error of $5\times10^{-4}$ for reconstructed sensor signals, and reducing joint angular error by 37\% for the reconstructed human poses compared to the baseline.

End-to-End Human Pose Reconstruction from Wearable Sensors for 6G Extended Reality Systems

TL;DR

The paper addresses robust 3D human pose reconstruction for XR over 6G by transmitting IMU data through an OFDM uplink and receiving it with a two-stage deep learning system. The neural receiver jointly estimates the wireless channel and decodes OFDM symbols, while a subsequent IMU mapper converts decoded signals to SMPL pose parameters, enabling end-to-end pose reconstruction under realistic channel impairments and quantization. Results show a BER improvement of about dB at a BER of compared to LS-LMMSE baselines, and demonstrate that bits of IMU quantization suffice to achieve high-fidelity poses with MSE around and MPJAE below for , validated on ray-traced site-specific channels. The work highlights the viability of seamless XR experiences over 6G by integrating channel estimation, symbol decoding, and pose synthesis in an end-to-end framework, and provides a dataset created with ray-traced OFDM channels to support reproducibility and further research.

Abstract

Full 3D human pose reconstruction is a critical enabler for extended reality (XR) applications in future sixth generation (6G) networks, supporting immersive interactions in gaming, virtual meetings, and remote collaboration. However, achieving accurate pose reconstruction over wireless networks remains challenging due to channel impairments, bit errors, and quantization effects. Existing approaches often assume error-free transmission in indoor settings, limiting their applicability to real-world scenarios. To address these challenges, we propose a novel deep learning-based framework for human pose reconstruction over orthogonal frequency-division multiplexing (OFDM) systems. The framework introduces a two-stage deep learning receiver: the first stage jointly estimates the wireless channel and decodes OFDM symbols, and the second stage maps the received sensor signals to full 3D body poses. Simulation results demonstrate that the proposed neural receiver reduces bit error rate (BER), thus gaining a 5 dB gap at BER, compared to the baseline method that employs separate signal detection steps, i.e., least squares channel estimation and linear minimum mean square error equalization. Additionally, our empirical findings show that 8-bit quantization is sufficient for accurate pose reconstruction, achieving a mean squared error of for reconstructed sensor signals, and reducing joint angular error by 37\% for the reconstructed human poses compared to the baseline.

Paper Structure

This paper contains 18 sections, 14 equations, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: Our single-user OFDM system in which the user is equipped with a single antenna placed at the XR headset duru2024pose. The transmitter sends IMU signals over the uplink channel to the receiver. The IMU signals are quantized before transmission over the OFDM channel. The OFDM channel is modelled with a ray tracing propagation method in Sionna library sionna. Finally, the receiver decodes the OFDM symbols into IMU signals and then maps the IMU signals into specific human body poses.
  • Figure 2: Neural receiver approach to decode the received OFDM resource grid into information bits. Unlike conventional receiver that requires multiple processing blocks, i.e., channel estimation, equalization, and demodulation, the neural receiver approach jointly learns the parameters through supervised learning. At test time, the pre-trained neural receiver is capable of performing signal detection in real-time.
  • Figure 3: (a) A 3D map of an area in Munich (Germany) with propagation paths created using the Sionna Ray Tracing toolkit. Figure (b) is the channel impulse response realization of the paths in figure (a).
  • Figure 4: Illustration of two pilot configurations,"2P" and "1P", in an OFDM resource grid. The 2P configuration uses two pilot OFDM symbols (indices 2 and 12) for channel estimation, while the 1P configuration uses a single pilot OFDM symbol (index 2) aoudia2021end. The resource grid also includes data subcarriers and masked subcarriers, as indicated in the legend.
  • Figure 5: Overview of the two-stage deep learning-based receiver for human pose reconstruction. (a) At the offline training phase, the IMU receiver is trained with raw IMU signals and ground truth human pose labels. (b) At the online signal detection and inference phase, the IMU signals are quantized and modulated before transmission. After that, the two-stage deep learning-based receiver perform simultaneous signal detection, de-quantization, and human pose prediction in real-time as no backpropagation occurs in this stage.
  • ...and 5 more figures