Table of Contents
Fetching ...

Sticky-Glance: Robust Intent Recognition for Human Robot Collaboration via Single-Glance

Yuzhi Lai, Shenghai Yuan, Peizheng Li, Andreas Zell

TL;DR

An object-centric gaze grounding framework that stabilizes intent through a sticky-glance algorithm, jointly modeling geometric distance and direction trends is proposed, enabling high-readiness control and human-in-loop feedback, thereby reducing task duration for nearly 10 \%.

Abstract

Gaze is a valuable means of communication for impaired people with extremely limited motor capabilities. However, robust gaze-based intent recognition in multi-object environments is challenging due to gaze noise, micro-saccades, viewpoint changes, and dynamic objects. To address this, we propose an object-centric gaze grounding framework that stabilizes intent through a sticky-glance algorithm, jointly modeling geometric distance and direction trends. The inferred intent remains anchored to the object even under short glances with minimal 3 gaze samples, achieving a tracking rate of 0.94 for dynamic targets and selection accuracy of 0.98 for static targets. We further introduce a continuous shared control and multi-modal interaction paradigm, enabling high-readiness control and human-in-loop feedback, thereby reducing task duration for nearly 10 \%. Experiments across dynamic tracking, multi-perspective alignment, a baseline comparison, user studies, and ablation studies demonstrate improved robustness, efficiency, and reduced workload compared to representative baselines.

Sticky-Glance: Robust Intent Recognition for Human Robot Collaboration via Single-Glance

TL;DR

An object-centric gaze grounding framework that stabilizes intent through a sticky-glance algorithm, jointly modeling geometric distance and direction trends is proposed, enabling high-readiness control and human-in-loop feedback, thereby reducing task duration for nearly 10 \%.

Abstract

Gaze is a valuable means of communication for impaired people with extremely limited motor capabilities. However, robust gaze-based intent recognition in multi-object environments is challenging due to gaze noise, micro-saccades, viewpoint changes, and dynamic objects. To address this, we propose an object-centric gaze grounding framework that stabilizes intent through a sticky-glance algorithm, jointly modeling geometric distance and direction trends. The inferred intent remains anchored to the object even under short glances with minimal 3 gaze samples, achieving a tracking rate of 0.94 for dynamic targets and selection accuracy of 0.98 for static targets. We further introduce a continuous shared control and multi-modal interaction paradigm, enabling high-readiness control and human-in-loop feedback, thereby reducing task duration for nearly 10 \%. Experiments across dynamic tracking, multi-perspective alignment, a baseline comparison, user studies, and ablation studies demonstrate improved robustness, efficiency, and reduced workload compared to representative baselines.
Paper Structure (26 sections, 6 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 26 sections, 6 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Our proposed approach anchors scattered gaze points to intent objects, enabling efficient and natural interaction with a short glance.
  • Figure 2: Our system collects data through Meta ARIA glasses, then performs asynchronous off-device inference. Data is transmitted via Wi-Fi.
  • Figure 3: Illustration of the gaze trajectory (left) and the calculation for Sticky-Glance Intent Prediction (right).
  • Figure 4: Demonstration of the pipeline for object-level pointcloud association and the pipeline for multi-perspective alignment.
  • Figure 5: Overview of Experimental Scenarios. Scenario 1 evaluates robustness of our proposed intent confidence algorithm. Scenario 2 evaluate the robustness of multi-perspective alignment. Scenarios 3-4 assess real-world robotic task execution at increasing complexity. Sequence tasks involve combinations of simple actions such as pick, put, pour, and position swap. Complicated tasks require robots to handle occlusion issues and the overlapping of objects.
  • ...and 3 more figures