ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild

Arya Farkhondeh; Samy Tafasca; Jean-Marc Odobez

ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild

Arya Farkhondeh, Samy Tafasca, Jean-Marc Odobez

TL;DR

The findings suggest that ChildPlay-Hand is a challenging new benchmark for modeling HOI in the wild, and benchmark various spatio-temporal and segmentation networks, exploring body vs. hand-region information and comparing pose and RGB modalities.

Abstract

Hand-Object Interaction (HOI) is gaining significant attention, particularly with the creation of numerous egocentric datasets driven by AR/VR applications. However, third-person view HOI has received less attention, especially in terms of datasets. Most third-person view datasets are curated for action recognition tasks and feature pre-segmented clips of high-level daily activities, leaving a gap for in-the-wild datasets. To address this gap, we propose ChildPlay-Hand, a novel dataset that includes person and object bounding boxes, as well as manipulation actions. ChildPlay-Hand is unique in: (1) providing per-hand annotations; (2) featuring videos in uncontrolled settings with natural interactions, involving both adults and children; (3) including gaze labels from the ChildPlay-Gaze dataset for joint modeling of manipulations and gaze. The manipulation actions cover the main stages of an HOI cycle, such as grasping, holding or operating, and different types of releasing. To illustrate the interest of the dataset, we study two tasks: object in hand detection (OiH), i.e. if a person has an object in their hand, and manipulation stages (ManiS), which is more fine-grained and targets the main stages of manipulation. We benchmark various spatio-temporal and segmentation networks, exploring body vs. hand-region information and comparing pose and RGB modalities. Our findings suggest that ChildPlay-Hand is a challenging new benchmark for modeling HOI in the wild.

ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild

TL;DR

Abstract

Paper Structure (22 sections, 2 equations, 9 figures, 3 tables)

This paper contains 22 sections, 2 equations, 9 figures, 3 tables.

Introduction
Related Datasets
Hand-Object Interaction (HOI) datasets
Action Recognition
Temporal Action Segmentation (TAS)
ChildPlay-Hand
Hand Interactions
Annotation Protocol
Annotation Statistics
Comparison to Other Datasets
Experiments
Benchmarked Tasks
OiH and ManiS Recognition
OiH and ManiS Segmentation
Results
...and 7 more sections

Figures (9)

Figure 1: Sample instances from the ChildPlay-Hand dataset with person bounding boxes and the per-hand object bounding boxes and corresponding action classes.
Figure 2: Distribution of hand action classes in the dataset. We show the distribution in frames (top) and events (bottom).
Figure 3: Distribution of event duration (in frames) per action class. The violin plot shows the min, max and median values of each distribution.
Figure 4: Class-wise frame-based: Precision, Recall, and F1.
Figure 5: Class-wise segmental: Precision, Recall, and F1.
...and 4 more figures

ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild

TL;DR

Abstract

ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild

Authors

TL;DR

Abstract

Table of Contents

Figures (9)