Table of Contents
Fetching ...

Continuous ErrP detections during multimodal human-robot interaction

Su Kyoung Kim, Michael Maurus, Mathias Trampler, Marc Tabie, Elsa Andrea Kirchner

TL;DR

This work tackles continuous ErrP detection in long-duration, multimodal human-robot interaction where a robot communicates its intentions verbally and via gestures. It introduces a framework using forward and backward sliding-window feature extraction with an online PA1 classifier to enable asynchronous ErrP detection, evaluated in a RH5 Manus lunar-scenario with speech and pointing actions. The study achieves an average balanced accuracy of 91% across 9 subjects, highlighting notable inter-subject variability and the potential for per-subject customization of feature selection. The findings establish the feasibility of continuous ErrP-based intrinsic feedback in interactive reinforcement learning and multimodal HRI, with future work aimed at automatic per-subject feature optimization and applying ErrP signals to online robot-learning loops.

Abstract

Human-in-the-loop approaches are of great importance for robot applications. In the presented study, we implemented a multimodal human-robot interaction (HRI) scenario, in which a simulated robot communicates with its human partner through speech and gestures. The robot announces its intention verbally and selects the appropriate action using pointing gestures. The human partner, in turn, evaluates whether the robot's verbal announcement (intention) matches the action (pointing gesture) chosen by the robot. For cases where the verbal announcement of the robot does not match the corresponding action choice of the robot, we expect error-related potentials (ErrPs) in the human electroencephalogram (EEG). These intrinsic evaluations of robot actions by humans, evident in the EEG, were recorded in real time, continuously segmented online and classified asynchronously. For feature selection, we propose an approach that allows the combinations of forward and backward sliding windows to train a classifier. We achieved an average classification performance of 91% across 9 subjects. As expected, we also observed a relatively high variability between the subjects. In the future, the proposed feature selection approach will be extended to allow for customization of feature selection. To this end, the best combinations of forward and backward sliding windows will be automatically selected to account for inter-subject variability in classification performance. In addition, we plan to use the intrinsic human error evaluation evident in the error case by the ErrP in interactive reinforcement learning to improve multimodal human-robot interaction.

Continuous ErrP detections during multimodal human-robot interaction

TL;DR

This work tackles continuous ErrP detection in long-duration, multimodal human-robot interaction where a robot communicates its intentions verbally and via gestures. It introduces a framework using forward and backward sliding-window feature extraction with an online PA1 classifier to enable asynchronous ErrP detection, evaluated in a RH5 Manus lunar-scenario with speech and pointing actions. The study achieves an average balanced accuracy of 91% across 9 subjects, highlighting notable inter-subject variability and the potential for per-subject customization of feature selection. The findings establish the feasibility of continuous ErrP-based intrinsic feedback in interactive reinforcement learning and multimodal HRI, with future work aimed at automatic per-subject feature optimization and applying ErrP signals to online robot-learning loops.

Abstract

Human-in-the-loop approaches are of great importance for robot applications. In the presented study, we implemented a multimodal human-robot interaction (HRI) scenario, in which a simulated robot communicates with its human partner through speech and gestures. The robot announces its intention verbally and selects the appropriate action using pointing gestures. The human partner, in turn, evaluates whether the robot's verbal announcement (intention) matches the action (pointing gesture) chosen by the robot. For cases where the verbal announcement of the robot does not match the corresponding action choice of the robot, we expect error-related potentials (ErrPs) in the human electroencephalogram (EEG). These intrinsic evaluations of robot actions by humans, evident in the EEG, were recorded in real time, continuously segmented online and classified asynchronously. For feature selection, we propose an approach that allows the combinations of forward and backward sliding windows to train a classifier. We achieved an average classification performance of 91% across 9 subjects. As expected, we also observed a relatively high variability between the subjects. In the future, the proposed feature selection approach will be extended to allow for customization of feature selection. To this end, the best combinations of forward and backward sliding windows will be automatically selected to account for inter-subject variability in classification performance. In addition, we plan to use the intrinsic human error evaluation evident in the error case by the ErrP in interactive reinforcement learning to improve multimodal human-robot interaction.
Paper Structure (6 sections, 3 figures, 1 table)

This paper contains 6 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Scenario: The simulated robot named RH5 Manus verbally announces which tool it intends to select and then performs a corresponding pointing action. The human partner in turn evaluates whether the robot's verbal announcement matches the robot's action. For example, if the robot verbally announces that the hammer will be selected and then points to the hammer, the robot's action is correct. In this case, we do not expect any error-related potentials. However, if the robot's verbal announcement does not match the robot's expected matching action, ErrPs will be evoked during the execution of the robot's pointing gesture.
  • Figure 2: Experiment design. (A) Episode: An episode begins with the start of the robot's verbal announcement and ends with the return to the initial position. (B) Concept of forward and backward sliding windows used for training a classifier: Features are extracted from the time period between the onset of movement and the onset of gesture using forward and backward sliding windows. Note that we divided the robot's action into different phases (directional movements and gesture movements), but the robot performs a continuous action to point to one of three objects. (C) Feature selection during training for correct and incorrect episodes and during continuous testing. Evaluation is based on the marked time period.
  • Figure 3: EEG preprocessing and classification: EEGs were segmented, normalized, decimated, and bandpass filtered. A spatial filter called xDAWN Rivet:IEEE_TBE:2009 was applied to enhance the signal-to-noise ratio and to reduce the dimensionality. Features were extracted from seven pseudo channels. Details of feature selection and extraction are shown in Fig. \ref{['fig:exp_design']}. The online passive-aggressive algorithms variant 1 (PA1) crammer_online_2006 was used for classification.