Table of Contents
Fetching ...

Gaze-Based Intention Recognition for Human-Robot Collaboration

Valerio Belcamino, Miwa Takase, Mariya Kilina, Alessandro Carfì, Akira Shimada, Sota Shimizu, Fulvio Mastrogiovanni

TL;DR

The paper addresses the challenge of online human intent recognition in human-robot collaboration during assembly tasks. It compares gaze-based intention estimation using a headset-based eye tracker and Unreal Engine against an IMU-based LSTM classifier, both embedded in a Hierarchical Task Network planning framework. Results show that gaze-based perception achieves performance comparable to IMUs in effectiveness and user satisfaction, with trade-offs in hardware and processing; idle and total assembly times differ but reflect similar coordination capabilities. The work highlights the potential of gaze as a viable, lower-sensor option and suggests future fusion of modalities to handle more complex and anticipatory human actions.

Abstract

This work aims to tackle the intent recognition problem in Human-Robot Collaborative assembly scenarios. Precisely, we consider an interactive assembly of a wooden stool where the robot fetches the pieces in the correct order and the human builds the parts following the instruction manual. The intent recognition is limited to the idle state estimation and it is needed to ensure a better synchronization between the two agents. We carried out a comparison between two distinct solutions involving wearable sensors and eye tracking integrated into the perception pipeline of a flexible planning architecture based on Hierarchical Task Networks. At runtime, the wearable sensing module exploits the raw measurements from four 9-axis Inertial Measurement Units positioned on the wrists and hands of the user as an input for a Long Short-Term Memory Network. On the other hand, the eye tracking relies on a Head Mounted Display and Unreal Engine. We tested the effectiveness of the two approaches with 10 participants, each of whom explored both options in alternate order. We collected explicit metrics about the attractiveness and efficiency of the two techniques through User Experience Questionnaires as well as implicit criteria regarding the classification time and the overall assembly time. The results of our work show that the two methods can reach comparable performances both in terms of effectiveness and user preference. Future development could aim at joining the two approaches two allow the recognition of more complex activities and to anticipate the user actions.

Gaze-Based Intention Recognition for Human-Robot Collaboration

TL;DR

The paper addresses the challenge of online human intent recognition in human-robot collaboration during assembly tasks. It compares gaze-based intention estimation using a headset-based eye tracker and Unreal Engine against an IMU-based LSTM classifier, both embedded in a Hierarchical Task Network planning framework. Results show that gaze-based perception achieves performance comparable to IMUs in effectiveness and user satisfaction, with trade-offs in hardware and processing; idle and total assembly times differ but reflect similar coordination capabilities. The work highlights the potential of gaze as a viable, lower-sensor option and suggests future fusion of modalities to handle more complex and anticipatory human actions.

Abstract

This work aims to tackle the intent recognition problem in Human-Robot Collaborative assembly scenarios. Precisely, we consider an interactive assembly of a wooden stool where the robot fetches the pieces in the correct order and the human builds the parts following the instruction manual. The intent recognition is limited to the idle state estimation and it is needed to ensure a better synchronization between the two agents. We carried out a comparison between two distinct solutions involving wearable sensors and eye tracking integrated into the perception pipeline of a flexible planning architecture based on Hierarchical Task Networks. At runtime, the wearable sensing module exploits the raw measurements from four 9-axis Inertial Measurement Units positioned on the wrists and hands of the user as an input for a Long Short-Term Memory Network. On the other hand, the eye tracking relies on a Head Mounted Display and Unreal Engine. We tested the effectiveness of the two approaches with 10 participants, each of whom explored both options in alternate order. We collected explicit metrics about the attractiveness and efficiency of the two techniques through User Experience Questionnaires as well as implicit criteria regarding the classification time and the overall assembly time. The results of our work show that the two methods can reach comparable performances both in terms of effectiveness and user preference. Future development could aim at joining the two approaches two allow the recognition of more complex activities and to anticipate the user actions.
Paper Structure (5 sections, 4 figures)

This paper contains 5 sections, 4 figures.

Figures (4)

  • Figure 1: A top view of the experimental scenario, showing the interaction between human and robot.
  • Figure 2: The architecture diagram of the system. The HTN planner updates its state activating the perception modules and can move the robot using the Joint Trajectory Client. The right side of the chart depicts the perception modules with the associated type of sensor.
  • Figure 3: The picture shows a comparison of the results obtained by the idleness classification based on IMU sensors and Hololens. Each subplot from left to right is associated to one of the criteria provided by the User Experience Questionnaire.
  • Figure 4: The two boxplots show the time needed to complete the Wait action for the two interfaces