Table of Contents
Fetching ...

DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion

Tom Van Wouwe, Seunghwan Lee, Antoine Falisse, Scott Delp, C. Karen Liu

TL;DR

DiffusionPoser tackles real-time whole-body motion reconstruction from arbitrary sparse sensor configurations, using a single autoregressive diffusion model with inpainting denoising that co-opts sensor measurements without retraining. It supports SMPL and OpenSim skeletons and allows on-the-fly sensor-configuration optimization for different activities, while maintaining accuracy comparable to six-IMU baselines. The method demonstrates robustness to missing or corrupted signals and extends to multimodal sensing with pressure insoles, achieving real-time performance at $20$ Hz. This approach holds strong practical potential for health, performance, and entertainment applications where wearable sensors must be flexible, robust, and responsive.

Abstract

Motion capture from a limited number of body-worn sensors, such as inertial measurement units (IMUs) and pressure insoles, has important applications in health, human performance, and entertainment. Recent work has focused on accurately reconstructing whole-body motion from a specific sensor configuration using six IMUs. While a common goal across applications is to use the minimal number of sensors to achieve required accuracy, the optimal arrangement of the sensors might differ from application to application. We propose a single diffusion model, DiffusionPoser, which reconstructs human motion in real-time from an arbitrary combination of sensors, including IMUs placed at specified locations, and, pressure insoles. Unlike existing methods, our model grants users the flexibility to determine the number and arrangement of sensors tailored to the specific activity of interest, without the need for retraining. A novel autoregressive inferencing scheme ensures real-time motion reconstruction that closely aligns with measured sensor signals. The generative nature of DiffusionPoser ensures realistic behavior, even for degrees-of-freedom not directly measured. Qualitative results can be found on our website: https://diffusionposer.github.io/.

DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion

TL;DR

DiffusionPoser tackles real-time whole-body motion reconstruction from arbitrary sparse sensor configurations, using a single autoregressive diffusion model with inpainting denoising that co-opts sensor measurements without retraining. It supports SMPL and OpenSim skeletons and allows on-the-fly sensor-configuration optimization for different activities, while maintaining accuracy comparable to six-IMU baselines. The method demonstrates robustness to missing or corrupted signals and extends to multimodal sensing with pressure insoles, achieving real-time performance at Hz. This approach holds strong practical potential for health, performance, and entertainment applications where wearable sensors must be flexible, robust, and responsive.

Abstract

Motion capture from a limited number of body-worn sensors, such as inertial measurement units (IMUs) and pressure insoles, has important applications in health, human performance, and entertainment. Recent work has focused on accurately reconstructing whole-body motion from a specific sensor configuration using six IMUs. While a common goal across applications is to use the minimal number of sensors to achieve required accuracy, the optimal arrangement of the sensors might differ from application to application. We propose a single diffusion model, DiffusionPoser, which reconstructs human motion in real-time from an arbitrary combination of sensors, including IMUs placed at specified locations, and, pressure insoles. Unlike existing methods, our model grants users the flexibility to determine the number and arrangement of sensors tailored to the specific activity of interest, without the need for retraining. A novel autoregressive inferencing scheme ensures real-time motion reconstruction that closely aligns with measured sensor signals. The generative nature of DiffusionPoser ensures realistic behavior, even for degrees-of-freedom not directly measured. Qualitative results can be found on our website: https://diffusionposer.github.io/.
Paper Structure (24 sections, 6 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 24 sections, 6 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: (Left) Examples of live reconstruction using DiffusionPoser. (Right) Subject instrumented with IMUs. We assume IMUs may be attached at 13 specific locations: pelvis, thighs, shanks, feet, arms, wrists, torso, head.
  • Figure 2: DiffIP transformer decoder network. Architecture of the denoiser $\bm{f}_{\theta}(\mathbf{\hat{z}}_t,t,h)$ that predicts the sample ($\hat{\bm{x}}_{0}$) given the noised sample $\bm{z}_t$, denoising step $t$ and the body height $h$. We use the transformer decoder architecture from Vaswani17 and use the step embedding and height embedding for cross attention as well as self attention by concatenating them to the input embedding.
  • Figure 3: Four step autoregressive inference including denoising inpainting. Motion is reconstructed frame-by-frame in real-time following a four step process. New predictions are shifted into history and serve as input for the reconstruction at the consecutive timesteps.
  • Figure 4: Motion reconstructions with PIP and DiffusionPoser (Ours) for different IMU configurations of a TotalCaptureReal sequence. Yellow: PIP with pelvis, head, wrists and shanks. Grey: ground truth. Purple: Ours with pelvis, head, wrists and shanks. Orange: Ours with wrists, shanks. Blue: Ours with pelvis and wrists. Green: Ours with shanks.