EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik
TL;DR
The paper tackles the challenge of 3D human motion capture from egocentric event streams using a fisheye monocular camera. It introduces EventEgo3D (EE3D), an end-to-end neural pipeline that converts high-temporal-resolution event data into 3D poses via a two-stage architecture: an Egocentric Pose Module (EPM) for 2D heatmap estimation and 3D lifting, and a Residual Event Propagation Module (REPM) that emphasizes wearer-related events and propagates past information. EE3D builds and uses two datasets, EE3D-S (synthetic) and EE3D-R (real), enabling training and evaluation for this new modality, and demonstrates real-time performance at $140$ Hz with superior 3D accuracy, particularly in challenging, fast-motion scenarios. The work provides a hardware-prototype head-mounted setup and extensive ablations, showing that event-based egocentric vision can surpass RGB-based approaches under varying illumination and motion conditions, with strong potential for mobile, low-power HMD applications.
Abstract
Monocular egocentric 3D human motion capture is a challenging and actively researched problem. Existing methods use synchronously operating visual sensors (e.g. RGB cameras) and often fail under low lighting and fast motions, which can be restricting in many applications involving head-mounted devices. In response to the existing limitations, this paper 1) introduces a new problem, i.e., 3D human motion capture from an egocentric monocular event camera with a fisheye lens, and 2) proposes the first approach to it called EventEgo3D (EE3D). Event streams have high temporal resolution and provide reliable cues for 3D human motion capture under high-speed human motions and rapidly changing illumination. The proposed EE3D framework is specifically tailored for learning with event streams in the LNES representation, enabling high 3D reconstruction accuracy. We also design a prototype of a mobile head-mounted device with an event camera and record a real dataset with event observations and the ground-truth 3D human poses (in addition to the synthetic dataset). Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions across various challenging experiments while supporting real-time 3D pose update rates of 140Hz.
