Event-based Egocentric Human Pose Estimation in Dynamic Environment
Wataru Ikeda, Masashi Hatano, Ryosei Hara, Mariko Isogawa
TL;DR
This work tackles egocentric 3D human pose estimation from a front-facing head-mounted event camera in dynamic environments. It introduces D-EventEgo, a three-stage pipeline that voxelizes the event stream into a background-extracting voxel grid V in $R^{T x H x W x B}$, estimates head pose H in $R^{T x D'}$, and generates full-body poses X in $R^{T x D}$ via a conditional diffusion model. Key contributions include a Motion Segmentation Module to remove dynamic objects, a synthetic EgoBody-derived event dataset, and experimental validation showing improvements over a RGB baseline on four of five metrics. The results demonstrate robustness to low-light and motion blur, highlighting the potential of event cameras for practical egocentric pose estimation and suggesting future work integrating RGB data and environmental context.
Abstract
Estimating human pose using a front-facing egocentric camera is essential for applications such as sports motion analysis, VR/AR, and AI for wearable devices. However, many existing methods rely on RGB cameras and do not account for low-light environments or motion blur. Event-based cameras have the potential to address these challenges. In this work, we introduce a novel task of human pose estimation using a front-facing event-based camera mounted on the head and propose D-EventEgo, the first framework for this task. The proposed method first estimates the head poses, and then these are used as conditions to generate body poses. However, when estimating head poses, the presence of dynamic objects mixed with background events may reduce head pose estimation accuracy. Therefore, we introduce the Motion Segmentation Module to remove dynamic objects and extract background information. Extensive experiments on our synthetic event-based dataset derived from EgoBody, demonstrate that our approach outperforms our baseline in four out of five evaluation metrics in dynamic environments.
