Table of Contents
Fetching ...

Reconstructing Human Pose from Inertial Measurements: A Generative Model-based Compressive Sensing Approach

Nguyen Quang Hieu, Dinh Thai Hoang, Diep N. Nguyen, Mohammad Abu Alsheikh

TL;DR

This paper addresses 3D human pose reconstruction from sparse IMU measurements transmitted over noisy wireless channels for VR/XR. It introduces a two-part framework—compressive sensing at the transmitter and a pre-trained variational auto-encoder at the edge receiver—that together achieve robust pose recovery under power and latency constraints. The key contributions include designing a power-constrained, set-restricted measurement matrix (S-REC) and a CS-VAE training scheme that delivers Lasso-like accuracy with orders of magnitude faster decoding, plus the ability to synthesize missing data and interpolate poses in the latent space. The approach demonstrates strong performance on the DIP-IMU dataset, offering practical impact for wireless VR/XR deployments with edge computing support and synthetic pose generation capabilities.

Abstract

The ability to sense, localize, and estimate the 3D position and orientation of the human body is critical in virtual reality (VR) and extended reality (XR) applications. This becomes more important and challenging with the deployment of VR/XR applications over the next generation of wireless systems such as 5G and beyond. In this paper, we propose a novel framework that can reconstruct the 3D human body pose of the user given sparse measurements from Inertial Measurement Unit (IMU) sensors over a noisy wireless environment. Specifically, our framework enables reliable transmission of compressed IMU signals through noisy wireless channels and effective recovery of such signals at the receiver, e.g., an edge server. This task is very challenging due to the constraints of transmit power, recovery accuracy, and recovery latency. To address these challenges, we first develop a deep generative model at the receiver to recover the data from linear measurements of IMU signals. The linear measurements of the IMU signals are obtained by a linear projection with a measurement matrix based on the compressive sensing theory. The key to the success of our framework lies in the novel design of the measurement matrix at the transmitter, which can not only satisfy power constraints for the IMU devices but also obtain a highly accurate recovery for the IMU signals at the receiver. This can be achieved by extending the set-restricted eigenvalue condition of the measurement matrix and combining it with an upper bound for the power transmission constraint. Our framework can achieve robust performance for recovering 3D human poses from noisy compressed IMU signals. Additionally, our pre-trained deep generative model achieves signal reconstruction accuracy comparable to an optimization-based approach, i.e., Lasso, but is an order of magnitude faster.

Reconstructing Human Pose from Inertial Measurements: A Generative Model-based Compressive Sensing Approach

TL;DR

This paper addresses 3D human pose reconstruction from sparse IMU measurements transmitted over noisy wireless channels for VR/XR. It introduces a two-part framework—compressive sensing at the transmitter and a pre-trained variational auto-encoder at the edge receiver—that together achieve robust pose recovery under power and latency constraints. The key contributions include designing a power-constrained, set-restricted measurement matrix (S-REC) and a CS-VAE training scheme that delivers Lasso-like accuracy with orders of magnitude faster decoding, plus the ability to synthesize missing data and interpolate poses in the latent space. The approach demonstrates strong performance on the DIP-IMU dataset, offering practical impact for wireless VR/XR deployments with edge computing support and synthetic pose generation capabilities.

Abstract

The ability to sense, localize, and estimate the 3D position and orientation of the human body is critical in virtual reality (VR) and extended reality (XR) applications. This becomes more important and challenging with the deployment of VR/XR applications over the next generation of wireless systems such as 5G and beyond. In this paper, we propose a novel framework that can reconstruct the 3D human body pose of the user given sparse measurements from Inertial Measurement Unit (IMU) sensors over a noisy wireless environment. Specifically, our framework enables reliable transmission of compressed IMU signals through noisy wireless channels and effective recovery of such signals at the receiver, e.g., an edge server. This task is very challenging due to the constraints of transmit power, recovery accuracy, and recovery latency. To address these challenges, we first develop a deep generative model at the receiver to recover the data from linear measurements of IMU signals. The linear measurements of the IMU signals are obtained by a linear projection with a measurement matrix based on the compressive sensing theory. The key to the success of our framework lies in the novel design of the measurement matrix at the transmitter, which can not only satisfy power constraints for the IMU devices but also obtain a highly accurate recovery for the IMU signals at the receiver. This can be achieved by extending the set-restricted eigenvalue condition of the measurement matrix and combining it with an upper bound for the power transmission constraint. Our framework can achieve robust performance for recovering 3D human poses from noisy compressed IMU signals. Additionally, our pre-trained deep generative model achieves signal reconstruction accuracy comparable to an optimization-based approach, i.e., Lasso, but is an order of magnitude faster.
Paper Structure (24 sections, 1 theorem, 49 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 24 sections, 1 theorem, 49 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

The recovered signal obtained by the generative model-based compressive sensing method under the power constraint is guaranteed to be a unique solution if where $\sigma_x^2$ and $\mu_x$ are the statistical variance and mean of the source signals $\mathbf{x} \in \mathbb{R}^n$, respectively, and $d > 0$ is a real number derived from the Chebyshev's inequality.

Figures (8)

  • Figure 1: An illustration of our proposed system model. A set of synchronized IMU sensors produces a sequence of data, e.g., orientation and acceleration, and compressive sensing down-samples the data sequence into a shorter sequence. The down-sampled sequence of IMU data is transmitted over a noisy channel. The receiver uses a deep generative model to recover the original data sequence from received signals.
  • Figure 2: Illustration of acceleration reading from an IMU sensor placed on the left wrist of the user (top figure) and the Fast Fourier Transform (FFT) of the x-axis acceleration data (bottom figure). The FFT reveals nearly $k$-sparse property of the IMU signal in which a few low-frequency coefficients have dominant values. As a result, the redundancy of the data can be approximated by considering the $k$ largest coefficients and assuming the rest coefficients are zero.
  • Figure 3: The proposed CS-VAE learning algorithm with a novel measurement matrix at the transmitter and the generative model, i.e., a VAE, at the receiver. The transmitted signal at the transmitter is the $m$-dimensional vector $\mathbf{y}$, which is a compressed version of the original $n$-dimensional vector $\mathbf{x}^*$. At the receiver, the VAE recovers the original signal, i.e., $\mathbf{\hat{x}} \approx \mathbf{x}^*$, from a noisy and compressed measurement $\mathbf{\hat{y}}$.
  • Figure 4: Mean square error of reconstructed signals when the number of measurements $m$ increases.
  • Figure 5: Mean square error of reconstructed signals when channel noise power increases.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Definition 1: Sparsity
  • Definition 2: Restricted Isometry Property
  • Definition 3: Restricted Eigenvalue Condition
  • Definition 4: Set-Restricted Eigenvalue Condition
  • Proposition 1: S-REC with power constraint
  • Remark
  • Definition 5: Chebyshev's inequality