Gaze Estimation for Human-Robot Interaction: Analysis Using the NICO Platform
Matej Palider, Omar Eldardeer, Viktor Kocur
TL;DR
This paper addresses the practical evaluation of gaze estimation in human-robot interaction within a shared workspace. It introduces an annotated dataset collected on the NICO platform and benchmarks four state-of-the-art gaze models using stereo-camera data to estimate gaze on a workspace plane. The key finding is that while angular accuracy aligns with general benchmarks, planar gaze localization remains relatively imprecise (median about $16.48\,\text{cm}$), informing how gaze should be integrated as a modality in HRI systems and highlighting the value of multimodal cues and temporal aggregation. The work provides actionable recommendations for incorporating gaze information in HRI and releases both the dataset and evaluation code for reproducibility.
Abstract
This paper evaluates the current gaze estimation methods within an HRI context of a shared workspace scenario. We introduce a new, annotated dataset collected with the NICO robotic platform. We evaluate four state-of-the-art gaze estimation models. The evaluation shows that the angular errors are close to those reported on general-purpose benchmarks. However, when expressed in terms of distance in the shared workspace the best median error is 16.48 cm quantifying the practical limitations of current methods. We conclude by discussing these limitations and offering recommendations on how to best integrate gaze estimation as a modality in HRI systems.
