3D Human Scan With A Moving Event Camera

Kai Kohyama; Shintaro Shiba; Yoshimitsu Aoki

3D Human Scan With A Moving Event Camera

Kai Kohyama, Shintaro Shiba, Yoshimitsu Aoki

TL;DR

The paper tackles 3D human body capture with moving event cameras by moving the sensor around a stationary subject and performing contour-event classification, distance-weighted voxel carving, and SMPL-based mesh fitting on the carved volume. This event-only pipeline–which treats events as $e_k=(x_k,y_k,t_k,p_k)$ and employs a distance-based attenuation to preserve fine detail–achieves accurate joints and body shape without frame data, and shows robustness to motion blur where frame-based methods struggle. It demonstrates state-of-the-art pose and mesh accuracy on synthetic datasets and provides strong qualitative meshes, highlighting the potential of high-temporal-resolution sensors for robust 3D human scanning. The work also discusses limitations and future directions, including SLAM integration and NeRF-style representations for richer scene understanding.

Abstract

Capturing a 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dynamic range, which imposes constraints in real-world application setups. Event cameras have the advantages of high temporal resolution and high dynamic range (HDR), but the development of event-based methods is necessary to handle data with different characteristics. This paper proposes a novel event-based method for 3D pose estimation and human mesh recovery. Prior work on event-based human mesh recovery require frames (images) as well as event data. The proposed method solely relies on events; it carves 3D voxels by moving the event camera around a stationary body, reconstructs the human pose and mesh by attenuated rays, and fit statistical body models, preserving high-frequency details. The experimental results show that the proposed method outperforms conventional frame-based methods in the estimation accuracy of both pose and body mesh. We also demonstrate results in challenging situations where a conventional camera has motion blur. This is the first to demonstrate event-only human mesh recovery, and we hope that it is the first step toward achieving robust and accurate 3D human body scanning from vision sensors. https://florpeng.github.io/event-based-human-scan/

3D Human Scan With A Moving Event Camera

TL;DR

and employs a distance-based attenuation to preserve fine detail–achieves accurate joints and body shape without frame data, and shows robustness to motion blur where frame-based methods struggle. It demonstrates state-of-the-art pose and mesh accuracy on synthetic datasets and provides strong qualitative meshes, highlighting the potential of high-temporal-resolution sensors for robust 3D human scanning. The work also discusses limitations and future directions, including SLAM integration and NeRF-style representations for richer scene understanding.

Abstract

Paper Structure (21 sections, 9 equations, 13 figures, 5 tables)

This paper contains 21 sections, 9 equations, 13 figures, 5 tables.

Introduction
Related Work
Frame-based Human Pose and Mesh Estimation
Event-based Human Pose and Mesh Estimation
Visual Hull
Method
Event Cameras
Contour Classification
Voxel Carving
SMPL Fitting
Experiments
Dataset
Evaluation Metrics and Baselines
Comparison with Frame-based Methods
Results on Motion-Blur Sequences
...and 6 more sections

Figures (13)

Figure 1: Summary of the proposed method. Our method reconstructs the human body mesh and estimates the pose only from an event camera that moves around the body.
Figure 2: Overview of the proposed method.
Figure 3: Contour classification network.
Figure 4: Carving.
Figure 5: SMPL fitting.
...and 8 more figures

3D Human Scan With A Moving Event Camera

TL;DR

Abstract

3D Human Scan With A Moving Event Camera

Authors

TL;DR

Abstract

Table of Contents

Figures (13)