Table of Contents
Fetching ...

Early Detection of Human Handover Intentions in Human-Robot Collaboration: Comparing EEG, Gaze, and Hand Motion

Parag Khanna, Nona Rajabi, Sumeyra U. Demir Kanik, Danica Kragic, Mårten Björkman, Christian Smith

TL;DR

This work tackles early detection of handover intent in human-robot collaboration by comparing EEG, gaze, and hand-motion signals within the same experimental setup. The authors build intention detectors for each modality and evaluate pre- and post-movement performance, using $AUC-ROC$ as the metric and nested cross-validation. Results show that gaze provides the fastest and most accurate handover classification, with pre-movement signals available for planning, while EEG is noisier and motion-based cues are delayed but competitive. The work demonstrates the value of multimodal fusion to boost weaker modalities and provides a publicly relevant dataset to advance real-time HRC control and future online systems.

Abstract

Human-robot collaboration (HRC) relies on accurate and timely recognition of human intentions to ensure seamless interactions. Among common HRC tasks, human-to-robot object handovers have been studied extensively for planning the robot's actions during object reception, assuming the human intention for object handover. However, distinguishing handover intentions from other actions has received limited attention. Most research on handovers has focused on visually detecting motion trajectories, which often results in delays or false detections when trajectories overlap. This paper investigates whether human intentions for object handovers are reflected in non-movement-based physiological signals. We conduct a multimodal analysis comparing three data modalities: electroencephalogram (EEG), gaze, and hand-motion signals. Our study aims to distinguish between handover-intended human motions and non-handover motions in an HRC setting, evaluating each modality's performance in predicting and classifying these actions before and after human movement initiation. We develop and evaluate human intention detectors based on these modalities, comparing their accuracy and timing in identifying handover intentions. To the best of our knowledge, this is the first study to systematically develop and test intention detectors across multiple modalities within the same experimental context of human-robot handovers. Our analysis reveals that handover intention can be detected from all three modalities. Nevertheless, gaze signals are the earliest as well as the most accurate to classify the motion as intended for handover or non-handover.

Early Detection of Human Handover Intentions in Human-Robot Collaboration: Comparing EEG, Gaze, and Hand Motion

TL;DR

This work tackles early detection of handover intent in human-robot collaboration by comparing EEG, gaze, and hand-motion signals within the same experimental setup. The authors build intention detectors for each modality and evaluate pre- and post-movement performance, using as the metric and nested cross-validation. Results show that gaze provides the fastest and most accurate handover classification, with pre-movement signals available for planning, while EEG is noisier and motion-based cues are delayed but competitive. The work demonstrates the value of multimodal fusion to boost weaker modalities and provides a publicly relevant dataset to advance real-time HRC control and future online systems.

Abstract

Human-robot collaboration (HRC) relies on accurate and timely recognition of human intentions to ensure seamless interactions. Among common HRC tasks, human-to-robot object handovers have been studied extensively for planning the robot's actions during object reception, assuming the human intention for object handover. However, distinguishing handover intentions from other actions has received limited attention. Most research on handovers has focused on visually detecting motion trajectories, which often results in delays or false detections when trajectories overlap. This paper investigates whether human intentions for object handovers are reflected in non-movement-based physiological signals. We conduct a multimodal analysis comparing three data modalities: electroencephalogram (EEG), gaze, and hand-motion signals. Our study aims to distinguish between handover-intended human motions and non-handover motions in an HRC setting, evaluating each modality's performance in predicting and classifying these actions before and after human movement initiation. We develop and evaluate human intention detectors based on these modalities, comparing their accuracy and timing in identifying handover intentions. To the best of our knowledge, this is the first study to systematically develop and test intention detectors across multiple modalities within the same experimental context of human-robot handovers. Our analysis reveals that handover intention can be detected from all three modalities. Nevertheless, gaze signals are the earliest as well as the most accurate to classify the motion as intended for handover or non-handover.

Paper Structure

This paper contains 7 sections, 4 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Experimental Setup: The participant sits across a table from the Baxter robot, wearing the EEG cap and Tobii eye tracker glasses. An Azure Kinect RGBD camera is installed on the robot's torso.
  • Figure 2: Experimental conditions and timing diagram, Adapted from Fig. 2 in nonaEEG. Visual instructions for the three task conditions: (a) solo, (b) handover, and (c) joint actions. (d) Timing diagram of the experiment with $t=0$ at the Go! signal.
  • Figure 3: Grand average over 13 subjects for the two conditions, Blue: Handover, Orange: Non-Handover (a) ERP gathered from the motor cortex (Channels: C3, C4, Cz, CP1, CP2, FC1, FC2) (b) ERDS of mu, beta and gamma Channel Cz, with the variance highlighted by the shaded region.
  • Figure 4: Detecting Handover Intention, Results of LDA (a, b, c) and LSTM (d, e, f) training on increasing time windows for (a, d) EEG, (b, e) gaze, and (c, f) hand motion. Different colors show individual participants, with the black line as the median performance and shaded areas as the standard error. The x-axis shows the end time of the training window and the dashed line at 0.0 marks movement onset.
  • Figure 5: Comparing the handover detection performance of (a) LDA and (b) LSTM models across modalities. The graphs show the median performance for all modalities, with the shaded area indicating the 25th and 75th percentiles.
  • ...and 5 more figures