Table of Contents
Fetching ...

The Midas Touch in Gaze vs. Hand Pointing: Modality-Specific Failure Modes and Implications for XR Interfaces

Mohammad Dastgheib, Fatemeh Pourmahdian

Abstract

Extended Reality (XR) interfaces impose both ergonomic and cognitive demands, yet current systems often force a binary choice between hand-based input, which can produce fatigue, and gaze-based input, which is vulnerable to the Midas Touch problem and precision limitations. We introduce the xr-adaptive-modality-2025 platform, a web-based open-source framework for studying whether modality-specific adaptive interventions can improve XR-relevant pointing performance and reduce workload relative to static unimodal interaction. The platform combines physiologically informed gaze simulation, an ISO 9241-9 multidirectional tapping task, and two modality-specific adaptive interventions: gaze declutter and hand target-width inflation. We evaluated the system in a 2 x 2 x 2 within-subjects design manipulating Modality (Hand vs. Gaze), UI Mode (Static vs. Adaptive), and Pressure (Yes vs. No). Results from N=69 participants show that hand yielded higher throughput than gaze (5.17 vs. 4.73 bits/s), lower error (1.8% vs. 19.1%), and lower NASA-TLX workload. Crucially, error profiles differed sharply by modality: gaze errors were predominantly slips (99.2%), whereas hand errors were predominantly misses (95.7%), consistent with the Midas Touch account. Of the two adaptive interventions, only gaze declutter executed in this dataset; it modestly reduced timeouts but not slips. Hand width inflation was not evaluable due to a UI integration bug. These findings reveal modality-specific failure modes with direct implications for adaptive policy design, and establish the platform as a reproducible infrastructure for future studies.

The Midas Touch in Gaze vs. Hand Pointing: Modality-Specific Failure Modes and Implications for XR Interfaces

Abstract

Extended Reality (XR) interfaces impose both ergonomic and cognitive demands, yet current systems often force a binary choice between hand-based input, which can produce fatigue, and gaze-based input, which is vulnerable to the Midas Touch problem and precision limitations. We introduce the xr-adaptive-modality-2025 platform, a web-based open-source framework for studying whether modality-specific adaptive interventions can improve XR-relevant pointing performance and reduce workload relative to static unimodal interaction. The platform combines physiologically informed gaze simulation, an ISO 9241-9 multidirectional tapping task, and two modality-specific adaptive interventions: gaze declutter and hand target-width inflation. We evaluated the system in a 2 x 2 x 2 within-subjects design manipulating Modality (Hand vs. Gaze), UI Mode (Static vs. Adaptive), and Pressure (Yes vs. No). Results from N=69 participants show that hand yielded higher throughput than gaze (5.17 vs. 4.73 bits/s), lower error (1.8% vs. 19.1%), and lower NASA-TLX workload. Crucially, error profiles differed sharply by modality: gaze errors were predominantly slips (99.2%), whereas hand errors were predominantly misses (95.7%), consistent with the Midas Touch account. Of the two adaptive interventions, only gaze declutter executed in this dataset; it modestly reduced timeouts but not slips. Hand width inflation was not evaluable due to a UI integration bug. These findings reveal modality-specific failure modes with direct implications for adaptive policy design, and establish the platform as a reproducible infrastructure for future studies.
Paper Structure (39 sections, 10 figures)

This paper contains 39 sections, 10 figures.

Figures (10)

  • Figure 1: Psychophysically-grounded gaze proxy pipeline. Panel A (left): The three-stage simulation: raw mouse input is differentiated to estimate angular velocity; velocity above 120 deg/s triggers saccadic suppression (cursor freezes, then jumps to new position); velocity below 30 deg/s triggers the fixation transform (Gaussian jitter $\sigma \approx$ 0.5 mm + first-order lag smoothing). Panel B (center): Output signal examples---fixation mode (B1) produces continuous spatial noise consistent with microsaccade statistics; saccade mode (B2) produces a cursor freeze and ballistic jump, reproducing the perceptual blind phase of real saccades. Panel C (right): The selection model, with a dwell-confirm tolerance ring (target radius + 10 px) that accommodates fixation jitter while keeping sensitivity well-defined.
  • Figure 2: Task UI overview. ISO 9241-9 multi-directional tapping task: participants select highlighted targets on a central canvas. A side HUD shows modality and block-level feedback.
  • Figure 3: Primary performance by modality. (A) Throughput (bits/s): hand produced higher throughput than gaze. (B) Error rate (%): gaze showed substantially higher error rate than hand. Error bars show 95% CI.
  • Figure 4: Error type composition by modality. Gaze errors are predominantly slips (accidental activations); hand errors are predominantly misses.
  • Figure 5: NASA-TLX overall workload (0--100) by modality. Gaze imposed higher subjective workload than hand. Error bars show 95% CI.
  • ...and 5 more figures