A Lightweight Human Pose Estimation Approach for Edge Computing-Enabled Metaverse with Compressive Sensing
Nguyen Quang Hieu, Dinh Thai Hoang, Diep N. Nguyen
TL;DR
The paper tackles lightweight reconstruction of 3D human poses from compressed IMU signals transmitted over noisy wireless links for edge-enabled Metaverse applications. It introduces CS-VAE, a pipeline that combines a Gaussian measurement matrix with a variational auto-encoder at the receiver, leveraging the set-restricted eigenvalue condition (S-REC) to enable reliable recovery from compressed, noisy data. The authors derive a measurement design that satisfies per-channel power constraints while preserving recoverability, and they train the generator via a Lagrangian objective, demonstrating competitive pose reconstruction on the DIP-IMU dataset using only about 82% of the original measurements with substantially lower latency than optimization-based baselines. This approach promises scalable, privacy-aware, and edge-friendly 3D pose estimation for real-time Metaverse experiences.
Abstract
The ability to estimate 3D movements of users over edge computing-enabled networks, such as 5G/6G networks, is a key enabler for the new era of extended reality (XR) and Metaverse applications. Recent advancements in deep learning have shown advantages over optimization techniques for estimating 3D human poses given spare measurements from sensor signals, i.e., inertial measurement unit (IMU) sensors attached to the XR devices. However, the existing works lack applicability to wireless systems, where transmitting the IMU signals over noisy wireless networks poses significant challenges. Furthermore, the potential redundancy of the IMU signals has not been considered, resulting in highly redundant transmissions. In this work, we propose a novel approach for redundancy removal and lightweight transmission of IMU signals over noisy wireless environments. Our approach utilizes a random Gaussian matrix to transform the original signal into a lower-dimensional space. By leveraging the compressive sensing theory, we have proved that the designed Gaussian matrix can project the signal into a lower-dimensional space and preserve the Set-Restricted Eigenvalue condition, subject to a power transmission constraint. Furthermore, we develop a deep generative model at the receiver to recover the original IMU signals from noisy compressed data, thus enabling the creation of 3D human body movements at the receiver for XR and Metaverse applications. Simulation results on a real-world IMU dataset show that our framework can achieve highly accurate 3D human poses of the user using only $82\%$ of the measurements from the original signals. This is comparable to an optimization-based approach, i.e., Lasso, but is an order of magnitude faster.
