Table of Contents
Fetching ...

Benchmarking 2D Egocentric Hand Pose Datasets

Olga Taran, Damian M. Manzone, Jose Zariffa

TL;DR

It is revealed that despite the availability of numerous egocentric databases intended for 2D hand pose estimation, the majority are tailored for specific use cases and H2O and GANerated Hands datasets emerge as the most promising real and synthetic datasets, respectively.

Abstract

Hand pose estimation from egocentric video has broad implications across various domains, including human-computer interaction, assistive technologies, activity recognition, and robotics, making it a topic of significant research interest. The efficacy of modern machine learning models depends on the quality of data used for their training. Thus, this work is devoted to the analysis of state-of-the-art egocentric datasets suitable for 2D hand pose estimation. We propose a novel protocol for dataset evaluation, which encompasses not only the analysis of stated dataset characteristics and assessment of data quality, but also the identification of dataset shortcomings through the evaluation of state-of-the-art hand pose estimation models. Our study reveals that despite the availability of numerous egocentric databases intended for 2D hand pose estimation, the majority are tailored for specific use cases. There is no ideal benchmark dataset yet; however, H2O and GANerated Hands datasets emerge as the most promising real and synthetic datasets, respectively.

Benchmarking 2D Egocentric Hand Pose Datasets

TL;DR

It is revealed that despite the availability of numerous egocentric databases intended for 2D hand pose estimation, the majority are tailored for specific use cases and H2O and GANerated Hands datasets emerge as the most promising real and synthetic datasets, respectively.

Abstract

Hand pose estimation from egocentric video has broad implications across various domains, including human-computer interaction, assistive technologies, activity recognition, and robotics, making it a topic of significant research interest. The efficacy of modern machine learning models depends on the quality of data used for their training. Thus, this work is devoted to the analysis of state-of-the-art egocentric datasets suitable for 2D hand pose estimation. We propose a novel protocol for dataset evaluation, which encompasses not only the analysis of stated dataset characteristics and assessment of data quality, but also the identification of dataset shortcomings through the evaluation of state-of-the-art hand pose estimation models. Our study reveals that despite the availability of numerous egocentric databases intended for 2D hand pose estimation, the majority are tailored for specific use cases. There is no ideal benchmark dataset yet; however, H2O and GANerated Hands datasets emerge as the most promising real and synthetic datasets, respectively.
Paper Structure (5 sections, 5 figures, 2 tables)

This paper contains 5 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Egocentric datasets image examples.
  • Figure 2: Ego3DHands dataset examples.
  • Figure 3: Problematic features of some datasets.
  • Figure 4: DRMS error (vertical axis) with respect to the confidence of joints' estimation (horizontal axis). Only joints with a confidence equal to or greater than the threshold value depicted on the horizontal axis are considered in the calculation of the DRMS error. MediaPipe does not provide estimation confidence, so we assume its values to be constant. For (a) and (b) only one hand was in the field of view so no hand detector was required and data was split between object and no-object interactions. For (c-f), two hands were in the filed of view and two different hand detectors were tested.
  • Figure 5: Percentage of Correctly detected Keypoints (PCK; vertical axis) with respect to the accepted deviation (in pixels, horizontal axis) between the ground truth and all estimated joints, confidence $\ge 0$.