3D Human Scan With A Moving Event Camera
Kai Kohyama, Shintaro Shiba, Yoshimitsu Aoki
TL;DR
The paper tackles 3D human body capture with moving event cameras by moving the sensor around a stationary subject and performing contour-event classification, distance-weighted voxel carving, and SMPL-based mesh fitting on the carved volume. This event-only pipeline–which treats events as $e_k=(x_k,y_k,t_k,p_k)$ and employs a distance-based attenuation to preserve fine detail–achieves accurate joints and body shape without frame data, and shows robustness to motion blur where frame-based methods struggle. It demonstrates state-of-the-art pose and mesh accuracy on synthetic datasets and provides strong qualitative meshes, highlighting the potential of high-temporal-resolution sensors for robust 3D human scanning. The work also discusses limitations and future directions, including SLAM integration and NeRF-style representations for richer scene understanding.
Abstract
Capturing a 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dynamic range, which imposes constraints in real-world application setups. Event cameras have the advantages of high temporal resolution and high dynamic range (HDR), but the development of event-based methods is necessary to handle data with different characteristics. This paper proposes a novel event-based method for 3D pose estimation and human mesh recovery. Prior work on event-based human mesh recovery require frames (images) as well as event data. The proposed method solely relies on events; it carves 3D voxels by moving the event camera around a stationary body, reconstructs the human pose and mesh by attenuated rays, and fit statistical body models, preserving high-frequency details. The experimental results show that the proposed method outperforms conventional frame-based methods in the estimation accuracy of both pose and body mesh. We also demonstrate results in challenging situations where a conventional camera has motion blur. This is the first to demonstrate event-only human mesh recovery, and we hope that it is the first step toward achieving robust and accurate 3D human body scanning from vision sensors. https://florpeng.github.io/event-based-human-scan/
