Table of Contents
Fetching ...

SwEYEpinch: Exploring Intuitive, Efficient Text Entry for Extended Reality via Eye and Hand Tracking

Ziheng "Leo" Li, Xichen He, Mengyuan "Millie" Wu, Zeyi Tong, Haowen Wei, Benjamin Yang, Steven Feiner, Paul Sajda

Abstract

Despite steady progress, text entry in Extended Reality (XR) often remains slower and more effortful than typing on a physical keyboard or touchscreen. We explore a simple idea: use gaze to swipe through a virtual keyboard for the fast, low-effort where and a manual pinch held throughout the swipe for the when, extending and validating it through a series of user studies. We first show that a basic version including a low-latency decoder with spatiotemporal Dynamic Time Warping and fixation filtering outperforms selecting individual keys sequentially, either by finger tapping each or gazing at each while pinching. We then add mid-swipe prediction and in-gesture cancellation, improving words per minute (WPM) without hurting accuracy. We show that this approach is faster and more preferred than previous gaze-swipe approaches, finger tapping with prediction, or hand swiping with the same additions. Furthermore, a seven-day, 30-session study demonstrates sustained learning, with peak performance reaching 64.7 WPM.

SwEYEpinch: Exploring Intuitive, Efficient Text Entry for Extended Reality via Eye and Hand Tracking

Abstract

Despite steady progress, text entry in Extended Reality (XR) often remains slower and more effortful than typing on a physical keyboard or touchscreen. We explore a simple idea: use gaze to swipe through a virtual keyboard for the fast, low-effort where and a manual pinch held throughout the swipe for the when, extending and validating it through a series of user studies. We first show that a basic version including a low-latency decoder with spatiotemporal Dynamic Time Warping and fixation filtering outperforms selecting individual keys sequentially, either by finger tapping each or gazing at each while pinching. We then add mid-swipe prediction and in-gesture cancellation, improving words per minute (WPM) without hurting accuracy. We show that this approach is faster and more preferred than previous gaze-swipe approaches, finger tapping with prediction, or hand swiping with the same additions. Furthermore, a seven-day, 30-session study demonstrates sustained learning, with peak performance reaching 64.7 WPM.

Paper Structure

This paper contains 63 sections, 9 equations, 18 figures, 10 tables.

Figures (18)

  • Figure 1: Techniques evaluated in US1: two simple XR baselines--Finger-Tap (letter-by-letter mid-air tapping on a virtual keyboard) and Gaze&Pinch (gaze-targeted key selection with a pinch delimiter)--compared against our proposed SwEYEpinch-Basic, our pinch-delimited gaze-swipe technique.
  • Figure 2: Performance results from US1 across five sessions. From left to right, WPM for the three techniques in US1, TER by the conditions in US1. SwEYEpinch-Basic error and match rates across sessions. Each bar represents the mean percentage of text-entry outcomes for three categories: (1) First Candidate Match: the first suggestion was correct; most desirable because the user can move on to the next word if the first candidate matches, (2) Any Candidate Match: at least one suggestion was correct, and (3) All Candidate Miss: no suggested candidates matched. Overall, the figure reveals how SwEYEpinch-Basic’s suggestion accuracy evolves with practice over multiple sessions. Error bars show standard errors.
  • Figure 3: Top: Pareto frontiers showing normalized user preference vs. WPM. Bottom: Raw NASA TLX scores.
  • Figure 4: Techniques evaluated in US2 are gaze-swipe baselines with different delimiters: SkiMR (the delimiting space key is placed on top of the keyboard following hu2024skimr), GlanceWriter XR, and our proposed SwEYEpinch-Basic and its improved version: SwEYEpinch.
  • Figure 5: Performance results from US2. From left to right: WPM for the three techniques across sessions, TER across conditions. Error bars show standard errors; the match and miss rates are explained in Figure \ref{['fig:us1_sweyepe_miss_rate']}. Comparing the decoding algorithm: SwEYEpinch vs GlanceWriter cui2023glancewriter.
  • ...and 13 more figures