Table of Contents
Fetching ...

Gaze-based intention estimation: principles, methodologies, and applications in HRI

Anna Belardinelli

TL;DR

This survey synthesizes how gaze dynamics serve as anticipatory signals linking human intentions to actions in HRI, bridging cognitive science with practical applications in teleoperation, assistive robotics, and ADAS. It maps gaze features (e.g., POR, AOI sequences) to proximal/motor intentions and surveys predictive models (HMMs, DBNs, POMDPs, LSTMs) that leverage temporal context for online inference. Key contributions include a taxonomy of gaze features and models, domain-specific implementations (computer vision, HRI, and ADAS), and a critical discussion of limitations such as multitasking, trust, and 3D gaze localization. The paper highlights the practical potential of gaze-informed, proactive assistance while calling for principled, personalized designs and deeper integration of 3D gaze data with scene understanding to enable robust, real-world systems.

Abstract

Intention prediction has become a relevant field of research in Human-Machine and Human-Robot Interaction. Indeed, any artificial system (co)-operating with and along humans, designed to assist and coordinate its actions with a human partner, would benefit from first inferring the human's current intention. To spare the user the cognitive burden of explicitly uttering their goals, this inference relies mostly on behavioral cues deemed indicative of the current action. It has been long known that eye movements are highly anticipatory of the single steps unfolding during a task, hence they can serve as a very early and reliable behavioural cue for intention recognition. This review aims to draw a line between insights in the psychological literature on visuomotor control and relevant applications of gaze-based intention recognition in technical domains, with a focus on teleoperated and assistive robotic systems. Starting from the cognitive principles underlying the relationship between intentions, eye movements, and action, the use of eye tracking and gaze-based models for intent recognition in Human-Robot Interaction is considered, with prevalent methodologies and their diverse applications. Finally, special consideration is given to relevant human factors issues and current limitations to be factored in when designing such systems.

Gaze-based intention estimation: principles, methodologies, and applications in HRI

TL;DR

This survey synthesizes how gaze dynamics serve as anticipatory signals linking human intentions to actions in HRI, bridging cognitive science with practical applications in teleoperation, assistive robotics, and ADAS. It maps gaze features (e.g., POR, AOI sequences) to proximal/motor intentions and surveys predictive models (HMMs, DBNs, POMDPs, LSTMs) that leverage temporal context for online inference. Key contributions include a taxonomy of gaze features and models, domain-specific implementations (computer vision, HRI, and ADAS), and a critical discussion of limitations such as multitasking, trust, and 3D gaze localization. The paper highlights the practical potential of gaze-informed, proactive assistance while calling for principled, personalized designs and deeper integration of 3D gaze data with scene understanding to enable robust, real-world systems.

Abstract

Intention prediction has become a relevant field of research in Human-Machine and Human-Robot Interaction. Indeed, any artificial system (co)-operating with and along humans, designed to assist and coordinate its actions with a human partner, would benefit from first inferring the human's current intention. To spare the user the cognitive burden of explicitly uttering their goals, this inference relies mostly on behavioral cues deemed indicative of the current action. It has been long known that eye movements are highly anticipatory of the single steps unfolding during a task, hence they can serve as a very early and reliable behavioural cue for intention recognition. This review aims to draw a line between insights in the psychological literature on visuomotor control and relevant applications of gaze-based intention recognition in technical domains, with a focus on teleoperated and assistive robotic systems. Starting from the cognitive principles underlying the relationship between intentions, eye movements, and action, the use of eye tracking and gaze-based models for intent recognition in Human-Robot Interaction is considered, with prevalent methodologies and their diverse applications. Finally, special consideration is given to relevant human factors issues and current limitations to be factored in when designing such systems.
Paper Structure (20 sections, 2 equations, 3 figures, 4 tables)

This paper contains 20 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Fields and sub-fields of application for gaze-based intention estimation, as surveyed in this review.
  • Figure 2: A graphical representation of the cascade of intention levels, from abstract distal intentions to motor intentions directly translating into motor actions in a situated way (redrawn from Pacherie2008).
  • Figure 3: Information exchange between the functional modules regulating perception and action: the relation between gaze control, visual processing and action control is orchestrated by the schema system which sets the agenda for locations to look at, objects to look for and movements to plan. The solid arrows show the information flow as sketched by Land09, here redrawn; the dashed arrow represents a further information exchange suggested here and demonstrated in Brouwer2009Belardinelli2016aBelardinelli2016b.