Table of Contents
Fetching ...

3D Human Scan With A Moving Event Camera

Kai Kohyama, Shintaro Shiba, Yoshimitsu Aoki

TL;DR

The paper tackles 3D human body capture with moving event cameras by moving the sensor around a stationary subject and performing contour-event classification, distance-weighted voxel carving, and SMPL-based mesh fitting on the carved volume. This event-only pipeline–which treats events as $e_k=(x_k,y_k,t_k,p_k)$ and employs a distance-based attenuation to preserve fine detail–achieves accurate joints and body shape without frame data, and shows robustness to motion blur where frame-based methods struggle. It demonstrates state-of-the-art pose and mesh accuracy on synthetic datasets and provides strong qualitative meshes, highlighting the potential of high-temporal-resolution sensors for robust 3D human scanning. The work also discusses limitations and future directions, including SLAM integration and NeRF-style representations for richer scene understanding.

Abstract

Capturing a 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dynamic range, which imposes constraints in real-world application setups. Event cameras have the advantages of high temporal resolution and high dynamic range (HDR), but the development of event-based methods is necessary to handle data with different characteristics. This paper proposes a novel event-based method for 3D pose estimation and human mesh recovery. Prior work on event-based human mesh recovery require frames (images) as well as event data. The proposed method solely relies on events; it carves 3D voxels by moving the event camera around a stationary body, reconstructs the human pose and mesh by attenuated rays, and fit statistical body models, preserving high-frequency details. The experimental results show that the proposed method outperforms conventional frame-based methods in the estimation accuracy of both pose and body mesh. We also demonstrate results in challenging situations where a conventional camera has motion blur. This is the first to demonstrate event-only human mesh recovery, and we hope that it is the first step toward achieving robust and accurate 3D human body scanning from vision sensors. https://florpeng.github.io/event-based-human-scan/

3D Human Scan With A Moving Event Camera

TL;DR

The paper tackles 3D human body capture with moving event cameras by moving the sensor around a stationary subject and performing contour-event classification, distance-weighted voxel carving, and SMPL-based mesh fitting on the carved volume. This event-only pipeline–which treats events as and employs a distance-based attenuation to preserve fine detail–achieves accurate joints and body shape without frame data, and shows robustness to motion blur where frame-based methods struggle. It demonstrates state-of-the-art pose and mesh accuracy on synthetic datasets and provides strong qualitative meshes, highlighting the potential of high-temporal-resolution sensors for robust 3D human scanning. The work also discusses limitations and future directions, including SLAM integration and NeRF-style representations for richer scene understanding.

Abstract

Capturing a 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dynamic range, which imposes constraints in real-world application setups. Event cameras have the advantages of high temporal resolution and high dynamic range (HDR), but the development of event-based methods is necessary to handle data with different characteristics. This paper proposes a novel event-based method for 3D pose estimation and human mesh recovery. Prior work on event-based human mesh recovery require frames (images) as well as event data. The proposed method solely relies on events; it carves 3D voxels by moving the event camera around a stationary body, reconstructs the human pose and mesh by attenuated rays, and fit statistical body models, preserving high-frequency details. The experimental results show that the proposed method outperforms conventional frame-based methods in the estimation accuracy of both pose and body mesh. We also demonstrate results in challenging situations where a conventional camera has motion blur. This is the first to demonstrate event-only human mesh recovery, and we hope that it is the first step toward achieving robust and accurate 3D human body scanning from vision sensors. https://florpeng.github.io/event-based-human-scan/
Paper Structure (21 sections, 9 equations, 13 figures, 5 tables)

This paper contains 21 sections, 9 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Summary of the proposed method. Our method reconstructs the human body mesh and estimates the pose only from an event camera that moves around the body.
  • Figure 2: Overview of the proposed method.
  • Figure 3: Contour classification network.
  • Figure 4: Carving.
  • Figure 5: SMPL fitting.
  • ...and 8 more figures