Table of Contents
Fetching ...

Diegetic Graphical User Interfaces and Intuitive Control of Assistive Robots via Eye-gaze

Emanuel Nunez Sardinha, Marcela Munera, Nancy Zook, David Western, Virginia Ruiz Garate

TL;DR

This work introduces Diegetic Graphical User Interfaces (D-GUI), a screenless, gaze-driven control framework for assistive robots that embeds interactive buttons directly onto the robot via fiducial markers. By leveraging eye-gaze dwell processing and a weighted, marker-based button localization pipeline, the system achieves continuous 3D Cartesian control of a 3DOF robotic arm, demonstrated on the Yale-CMU-Berkeley (YCB) Block Pick and Place Protocol with a mean score of 13.71/16 and favorable usability (SUS 75.36) alongside a moderate NASA-TLX workload (44.76). The implementation relies on ROS 2, MoveIt2, ArUco markers, and Docker for reproducible, real-time operation, and includes an open-source package release for replication. Findings indicate that users can interact with multiple interface elements with minimal practice and low cognitive load, while maintaining safety and robustness, suggesting strong potential for screenless, co-designable assistive interfaces. Limitations include calibration drift, lack of feedback channels, and the need for broader testing in disabled populations, guiding future work toward multimodal feedback, rotation control, and real-life activities of daily living benchmarking.

Abstract

Individuals with tetraplegia and similar forms of paralysis suffer physically and emotionally due to a lack of autonomy. To help regain part of this autonomy, assistive robotic arms have been shown to increase living independence. However, users with paralysis pose unique challenging conditions for the control of these devices. In this article, we present the use of Diegetic Graphical User Interfaces, a novel, intuitive, and computationally inexpensive approach for gaze-controlled interfaces applied to robots. By using symbols paired with fiducial markers, interactive buttons can be defined in the real world which the user can trigger via gaze, and which can be embedded easily into the environment. We apply this system to pilot a 3-degree-of-freedom robotic arm for precision pick-and-place tasks. The interface is placed directly on the robot to allow intuitive and direct interaction, eliminating the need for context-switching between external screens, menus, and the robot. After calibration and a brief habituation period, twenty-one participants from multiple backgrounds, ages and eye-sight conditions completed the Yale-CMU-Berkeley (YCB) Block Pick and Place Protocol to benchmark the system, achieving a mean score of 13.71 out of the maximum 16.00 points. Good usability and user experience were reported (System Usability Score of 75.36) while achieving a low task workload measure (NASA-TLX of 44.76). Results show that users can employ multiple interface elements to perform actions with minimal practice and with a small cognitive load. To our knowledge, this is the first easily reconfigurable screenless system that enables robot control entirely via gaze for Cartesian robot control without the need for eye or face gestures.

Diegetic Graphical User Interfaces and Intuitive Control of Assistive Robots via Eye-gaze

TL;DR

This work introduces Diegetic Graphical User Interfaces (D-GUI), a screenless, gaze-driven control framework for assistive robots that embeds interactive buttons directly onto the robot via fiducial markers. By leveraging eye-gaze dwell processing and a weighted, marker-based button localization pipeline, the system achieves continuous 3D Cartesian control of a 3DOF robotic arm, demonstrated on the Yale-CMU-Berkeley (YCB) Block Pick and Place Protocol with a mean score of 13.71/16 and favorable usability (SUS 75.36) alongside a moderate NASA-TLX workload (44.76). The implementation relies on ROS 2, MoveIt2, ArUco markers, and Docker for reproducible, real-time operation, and includes an open-source package release for replication. Findings indicate that users can interact with multiple interface elements with minimal practice and low cognitive load, while maintaining safety and robustness, suggesting strong potential for screenless, co-designable assistive interfaces. Limitations include calibration drift, lack of feedback channels, and the need for broader testing in disabled populations, guiding future work toward multimodal feedback, rotation control, and real-life activities of daily living benchmarking.

Abstract

Individuals with tetraplegia and similar forms of paralysis suffer physically and emotionally due to a lack of autonomy. To help regain part of this autonomy, assistive robotic arms have been shown to increase living independence. However, users with paralysis pose unique challenging conditions for the control of these devices. In this article, we present the use of Diegetic Graphical User Interfaces, a novel, intuitive, and computationally inexpensive approach for gaze-controlled interfaces applied to robots. By using symbols paired with fiducial markers, interactive buttons can be defined in the real world which the user can trigger via gaze, and which can be embedded easily into the environment. We apply this system to pilot a 3-degree-of-freedom robotic arm for precision pick-and-place tasks. The interface is placed directly on the robot to allow intuitive and direct interaction, eliminating the need for context-switching between external screens, menus, and the robot. After calibration and a brief habituation period, twenty-one participants from multiple backgrounds, ages and eye-sight conditions completed the Yale-CMU-Berkeley (YCB) Block Pick and Place Protocol to benchmark the system, achieving a mean score of 13.71 out of the maximum 16.00 points. Good usability and user experience were reported (System Usability Score of 75.36) while achieving a low task workload measure (NASA-TLX of 44.76). Results show that users can employ multiple interface elements to perform actions with minimal practice and with a small cognitive load. To our knowledge, this is the first easily reconfigurable screenless system that enables robot control entirely via gaze for Cartesian robot control without the need for eye or face gestures.
Paper Structure (44 sections, 13 equations, 11 figures, 5 tables)

This paper contains 44 sections, 13 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Concept for Diegetic Interface system.
  • Figure 2: Camera feed and eye-gaze processing pipeline for Diegetic User Interfaces, showing from left to right a) The setup with the head tracker and the diegetic buttons where the scene and point is gaze are captured. b) The stages of the pipeline including markers recognition, button area calculation, candidate weighting and projection back to 2D. c) Intersection check followed by debouncing of the signal. d ) Output stage, applied to the robot.
  • Figure 3: Input signal processing sample, showing raw input from intersection check (top), rolling average with limits and thresholding marks (middle), and final debounced signal with a Schmitt trigger (bottom). The parameters $a_{\text{on}}$ and $a_{\text{off}}$ define the activation of the end signal.
  • Figure 4: User interface setup. a) Directional arrows for 3DOF control. Top arrows control depth, bottom ones control planar movement relative to the user. b) External controls for opening/closing the gripper.
  • Figure 5: Diagram for the system software architecture.
  • ...and 6 more figures