Table of Contents
Fetching ...

A Lightweight Human Pose Estimation Approach for Edge Computing-Enabled Metaverse with Compressive Sensing

Nguyen Quang Hieu, Dinh Thai Hoang, Diep N. Nguyen

TL;DR

The paper tackles lightweight reconstruction of 3D human poses from compressed IMU signals transmitted over noisy wireless links for edge-enabled Metaverse applications. It introduces CS-VAE, a pipeline that combines a Gaussian measurement matrix with a variational auto-encoder at the receiver, leveraging the set-restricted eigenvalue condition (S-REC) to enable reliable recovery from compressed, noisy data. The authors derive a measurement design that satisfies per-channel power constraints while preserving recoverability, and they train the generator via a Lagrangian objective, demonstrating competitive pose reconstruction on the DIP-IMU dataset using only about 82% of the original measurements with substantially lower latency than optimization-based baselines. This approach promises scalable, privacy-aware, and edge-friendly 3D pose estimation for real-time Metaverse experiences.

Abstract

The ability to estimate 3D movements of users over edge computing-enabled networks, such as 5G/6G networks, is a key enabler for the new era of extended reality (XR) and Metaverse applications. Recent advancements in deep learning have shown advantages over optimization techniques for estimating 3D human poses given spare measurements from sensor signals, i.e., inertial measurement unit (IMU) sensors attached to the XR devices. However, the existing works lack applicability to wireless systems, where transmitting the IMU signals over noisy wireless networks poses significant challenges. Furthermore, the potential redundancy of the IMU signals has not been considered, resulting in highly redundant transmissions. In this work, we propose a novel approach for redundancy removal and lightweight transmission of IMU signals over noisy wireless environments. Our approach utilizes a random Gaussian matrix to transform the original signal into a lower-dimensional space. By leveraging the compressive sensing theory, we have proved that the designed Gaussian matrix can project the signal into a lower-dimensional space and preserve the Set-Restricted Eigenvalue condition, subject to a power transmission constraint. Furthermore, we develop a deep generative model at the receiver to recover the original IMU signals from noisy compressed data, thus enabling the creation of 3D human body movements at the receiver for XR and Metaverse applications. Simulation results on a real-world IMU dataset show that our framework can achieve highly accurate 3D human poses of the user using only $82\%$ of the measurements from the original signals. This is comparable to an optimization-based approach, i.e., Lasso, but is an order of magnitude faster.

A Lightweight Human Pose Estimation Approach for Edge Computing-Enabled Metaverse with Compressive Sensing

TL;DR

The paper tackles lightweight reconstruction of 3D human poses from compressed IMU signals transmitted over noisy wireless links for edge-enabled Metaverse applications. It introduces CS-VAE, a pipeline that combines a Gaussian measurement matrix with a variational auto-encoder at the receiver, leveraging the set-restricted eigenvalue condition (S-REC) to enable reliable recovery from compressed, noisy data. The authors derive a measurement design that satisfies per-channel power constraints while preserving recoverability, and they train the generator via a Lagrangian objective, demonstrating competitive pose reconstruction on the DIP-IMU dataset using only about 82% of the original measurements with substantially lower latency than optimization-based baselines. This approach promises scalable, privacy-aware, and edge-friendly 3D pose estimation for real-time Metaverse experiences.

Abstract

The ability to estimate 3D movements of users over edge computing-enabled networks, such as 5G/6G networks, is a key enabler for the new era of extended reality (XR) and Metaverse applications. Recent advancements in deep learning have shown advantages over optimization techniques for estimating 3D human poses given spare measurements from sensor signals, i.e., inertial measurement unit (IMU) sensors attached to the XR devices. However, the existing works lack applicability to wireless systems, where transmitting the IMU signals over noisy wireless networks poses significant challenges. Furthermore, the potential redundancy of the IMU signals has not been considered, resulting in highly redundant transmissions. In this work, we propose a novel approach for redundancy removal and lightweight transmission of IMU signals over noisy wireless environments. Our approach utilizes a random Gaussian matrix to transform the original signal into a lower-dimensional space. By leveraging the compressive sensing theory, we have proved that the designed Gaussian matrix can project the signal into a lower-dimensional space and preserve the Set-Restricted Eigenvalue condition, subject to a power transmission constraint. Furthermore, we develop a deep generative model at the receiver to recover the original IMU signals from noisy compressed data, thus enabling the creation of 3D human body movements at the receiver for XR and Metaverse applications. Simulation results on a real-world IMU dataset show that our framework can achieve highly accurate 3D human poses of the user using only of the measurements from the original signals. This is comparable to an optimization-based approach, i.e., Lasso, but is an order of magnitude faster.
Paper Structure (9 sections, 1 theorem, 12 equations, 5 figures)

This paper contains 9 sections, 1 theorem, 12 equations, 5 figures.

Key Result

Proposition 1

The recovered signal obtained by the generative model-based compressive sensing method under the power constraint $P_T$ is guaranteed to be a unique solution if (i) $\mathbf{A}$ satisfies S-REC property, and (ii) each element $A_{ij}$ (element $j$-th of the $i$-th row) of $\mathbf{A}$ is drawn i.i.d where $\sigma_x^2$ and $\mu_x$ are the statistical variance and mean values of the source signals $

Figures (5)

  • Figure 1: The proposed system model. A linear projection is performed at the transmitter with compressive sensing. The down-sampled IMU signals are transmitted via a wireless channel. The receiver empowered with a generative model recovers the original IMU signals from the noisy compressed data. Finally, the reconstructed avatar from the recovered IMU signals can be used in the Metaverse's virtual world.
  • Figure 2: Top figure: Acceleration reading from an IMU sensor placed on the left wrist of the user in dataset huang2018deep. Bottom figure: Fast Fourier Transform (FFT) of the acceleration data.
  • Figure 3: Reconstruction accuracy vs. number of measurements.
  • Figure 4: Reconstruction latency vs. number of inputs.
  • Figure 5: 3D reconstruction poses based on the reconstructed IMU signals.

Theorems & Definitions (3)

  • Definition 1: Restricted Eigenvalue Condition (REC)
  • Definition 2: Set-Restricted Eigenvalue Condition
  • Proposition 1: S-REC with power constraint