Table of Contents
Fetching ...

HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count

Noah Wiederhold, Ava Megyeri, DiMaggio Paris, Sean Banerjee, Natasha Kholgade Banerjee

TL;DR

The HOH dataset targets markerless, natural human-human handover by capturing 2,720 interactions across 136 objects with multi-view RGB-D, skeletons, and aligned 3D models. It provides rich ground truth, including 2D/3D segmentations, grasp taxonomy labels, comfort ratings, and precise time-frames for key handover events, enabling data-driven grasp, orientation, and trajectory estimation. The work details a comprehensive data pipeline—from acquisition and synchronization to ground-truth annotation and mask tracking—along with extensive computing resources and a long-term preservation plan, making HOH the largest fully markerless handover dataset to date. The dataset supports future applications in handover parameter estimation and human-robot collaboration, with open-source code and object models to foster community use and extension. Its scale, diversity of objects, and fully markerless capture approach address limitations of marker-based datasets and pave the way for robust vision- and learning-based handover algorithms.

Abstract

We present the HOH (Human-Object-Human) Handover Dataset, a large object count dataset with 136 objects, to accelerate data-driven research on handover studies, human-robot handover implementation, and artificial intelligence (AI) on handover parameter estimation from 2D and 3D data of person interactions. HOH contains multi-view RGB and depth data, skeletons, fused point clouds, grasp type and handedness labels, object, giver hand, and receiver hand 2D and 3D segmentations, giver and receiver comfort ratings, and paired object metadata and aligned 3D models for 2,720 handover interactions spanning 136 objects and 20 giver-receiver pairs-40 with role-reversal-organized from 40 participants. We also show experimental results of neural networks trained using HOH to perform grasp, orientation, and trajectory prediction. As the only fully markerless handover capture dataset, HOH represents natural human-human handover interactions, overcoming challenges with markered datasets that require specific suiting for body tracking, and lack high-resolution hand tracking. To date, HOH is the largest handover dataset in number of objects, participants, pairs with role reversal accounted for, and total interactions captured.

HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count

TL;DR

The HOH dataset targets markerless, natural human-human handover by capturing 2,720 interactions across 136 objects with multi-view RGB-D, skeletons, and aligned 3D models. It provides rich ground truth, including 2D/3D segmentations, grasp taxonomy labels, comfort ratings, and precise time-frames for key handover events, enabling data-driven grasp, orientation, and trajectory estimation. The work details a comprehensive data pipeline—from acquisition and synchronization to ground-truth annotation and mask tracking—along with extensive computing resources and a long-term preservation plan, making HOH the largest fully markerless handover dataset to date. The dataset supports future applications in handover parameter estimation and human-robot collaboration, with open-source code and object models to foster community use and extension. Its scale, diversity of objects, and fully markerless capture approach address limitations of marker-based datasets and pave the way for robust vision- and learning-based handover algorithms.

Abstract

We present the HOH (Human-Object-Human) Handover Dataset, a large object count dataset with 136 objects, to accelerate data-driven research on handover studies, human-robot handover implementation, and artificial intelligence (AI) on handover parameter estimation from 2D and 3D data of person interactions. HOH contains multi-view RGB and depth data, skeletons, fused point clouds, grasp type and handedness labels, object, giver hand, and receiver hand 2D and 3D segmentations, giver and receiver comfort ratings, and paired object metadata and aligned 3D models for 2,720 handover interactions spanning 136 objects and 20 giver-receiver pairs-40 with role-reversal-organized from 40 participants. We also show experimental results of neural networks trained using HOH to perform grasp, orientation, and trajectory prediction. As the only fully markerless handover capture dataset, HOH represents natural human-human handover interactions, overcoming challenges with markered datasets that require specific suiting for body tracking, and lack high-resolution hand tracking. To date, HOH is the largest handover dataset in number of objects, participants, pairs with role reversal accounted for, and total interactions captured.
Paper Structure (30 sections, 38 figures, 2 tables)

This paper contains 30 sections, 38 figures, 2 tables.

Figures (38)

  • Figure 1: A dataset informational card for HOH
  • Figure 2: Example 3D visualizations of full scene point clouds at 5 time points during a handover interaction, with Frame G (point of first giver contact) in the leftmost column, Frame T (point of transfer) in the center column, and Frame R (point of last receiver contact) in the rightmost column. The giver hand is highlighted magenta and the receiver hand is highlighted gold.
  • Figure 3: Template of the form completed by participants during a data collection session. The header changes depending on whether a participant is given the role of GIVER or RECEIVER.
  • Figure 4: Template of the form completed by experimenters during a data collection session.
  • Figure 5: Template of the form completed by experimenters after a data collection session. Note that the form is truncated in length for display purposes, but the actual form extends to serial index 68.
  • ...and 33 more figures