Table of Contents
Fetching ...

EventEgoHands: Event-based Egocentric 3D Hand Mesh Reconstruction

Ryosei Hara, Wataru Ikeda, Masashi Hatano, Mariko Isogawa

TL;DR

EventEgoHands introduces a first egocentric event-based approach to 3D hand mesh reconstruction, addressing background events from camera wearers with a Hand Segmentation Module and a MANO-based Reconstruction Module. It leverages a new synthetic dataset, N-HOT3D (447{,}704 samples), to train and evaluate on egocentric event data, achieving substantial gains over baselines in R-AUC, MPJPE, and MPVPE. The method combines filtered event data with Cross-Attention between hands and a multi-term loss to jointly optimize hand pose, shape, and mesh accuracy. The work advances practical egocentric hand tracking for AR/VR and robotics, while acknowledging remaining challenges in occlusion and fine fingertip motion, suggesting future work on hand-object interactions.

Abstract

Reconstructing 3D hand mesh is challenging but an important task for human-computer interaction and AR/VR applications. In particular, RGB and/or depth cameras have been widely used in this task. However, methods using these conventional cameras face challenges in low-light environments and during motion blur. Thus, to address these limitations, event cameras have been attracting attention in recent years for their high dynamic range and high temporal resolution. Despite their advantages, event cameras are sensitive to background noise or camera motion, which has limited existing studies to static backgrounds and fixed cameras. In this study, we propose EventEgoHands, a novel method for event-based 3D hand mesh reconstruction in an egocentric view. Our approach introduces a Hand Segmentation Module that extracts hand regions, effectively mitigating the influence of dynamic background events. We evaluated our approach and demonstrated its effectiveness on the N-HOT3D dataset, improving MPJPE by approximately more than 4.5 cm (43%).

EventEgoHands: Event-based Egocentric 3D Hand Mesh Reconstruction

TL;DR

EventEgoHands introduces a first egocentric event-based approach to 3D hand mesh reconstruction, addressing background events from camera wearers with a Hand Segmentation Module and a MANO-based Reconstruction Module. It leverages a new synthetic dataset, N-HOT3D (447{,}704 samples), to train and evaluate on egocentric event data, achieving substantial gains over baselines in R-AUC, MPJPE, and MPVPE. The method combines filtered event data with Cross-Attention between hands and a multi-term loss to jointly optimize hand pose, shape, and mesh accuracy. The work advances practical egocentric hand tracking for AR/VR and robotics, while acknowledging remaining challenges in occlusion and fine fingertip motion, suggesting future work on hand-object interactions.

Abstract

Reconstructing 3D hand mesh is challenging but an important task for human-computer interaction and AR/VR applications. In particular, RGB and/or depth cameras have been widely used in this task. However, methods using these conventional cameras face challenges in low-light environments and during motion blur. Thus, to address these limitations, event cameras have been attracting attention in recent years for their high dynamic range and high temporal resolution. Despite their advantages, event cameras are sensitive to background noise or camera motion, which has limited existing studies to static backgrounds and fixed cameras. In this study, we propose EventEgoHands, a novel method for event-based 3D hand mesh reconstruction in an egocentric view. Our approach introduces a Hand Segmentation Module that extracts hand regions, effectively mitigating the influence of dynamic background events. We evaluated our approach and demonstrated its effectiveness on the N-HOT3D dataset, improving MPJPE by approximately more than 4.5 cm (43%).

Paper Structure

This paper contains 17 sections, 8 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Egocentric event camera problem. A fixed third-person view is limited to specific scenarios, while an egocentric view offers greater flexibility and mobility. However, in an egocentric event camera setup, the camera wearer's movements can generate numerous background events, making it challenging to accurately recognize the hands.
  • Figure 2: The overview of EventEgoHands.
  • Figure 3: Sample from N-HOT3D. We used the MANO ground-truth annotations and RGB image directly from HOT3D hot3d, while independently providing the raw event and hand mask.
  • Figure 4: Qualitative Evaluation. We compared our method with EventHands EventHands and Ev2Hands ev2hands as baselines. Red arrows indicate the failure parts. RGB images were not used as input, but only as a reference.