GazeIntent: Adapting dwell-time selection in VR interaction with real-time intent modeling

Anish S. Narkar; Jan J. Michalak; Candace E. Peacock; Brendan David-John

GazeIntent: Adapting dwell-time selection in VR interaction with real-time intent modeling

Anish S. Narkar, Jan J. Michalak, Candace E. Peacock, Brendan David-John

TL;DR

GazeIntent addresses the Midas touch in gaze-only VR interaction by introducing a real-time gaze-intent model that scales dwell-time thresholds. An LSTM-based intent predictor (F1 = 0.94 offline) is trained on a VR divisibility dataset and deployed to adapt dwell times via a scaling factor computed from recent predictions, enabling faster and more reliable selections. In end-user studies with new and returning users, GI-G and especially GI-P approaches improve interaction speed and are preferred, demonstrating both task generalization and personalization benefits. While promising, the work notes limitations in sample size and calls for future work on threshold adaptation, continual learning, and broader task contexts, along with privacy considerations for predictive modeling in VR.

Abstract

The use of ML models to predict a user's cognitive state from behavioral data has been studied for various applications which includes predicting the intent to perform selections in VR. We developed a novel technique that uses gaze-based intent models to adapt dwell-time thresholds to aid gaze-only selection. A dataset of users performing selection in arithmetic tasks was used to develop intent prediction models (F1 = 0.94). We developed GazeIntent to adapt selection dwell times based on intent model outputs and conducted an end-user study with returning and new users performing additional tasks with varied selection frequencies. Personalized models for returning users effectively accounted for prior experience and were preferred by 63% of users. Our work provides the field with methods to adapt dwell-based selection to users, account for experience over time, and consider tasks that vary by selection frequency

GazeIntent: Adapting dwell-time selection in VR interaction with real-time intent modeling

TL;DR

Abstract

Paper Structure (42 sections, 2 equations, 5 figures, 2 tables)

This paper contains 42 sections, 2 equations, 5 figures, 2 tables.

Introduction
Related Works and Motivation
Dwell Time Gaze-based Selection
Gaze-based Prediction
Gaze-Intent based selection
Training Dataset & Model Development
Data Collection
Data Pipeline
Signal Processing
Event Detection
Feature Extraction
Data Windowing
Data Labeling
Model Training and Selection
Results
...and 27 more sections

Figures (5)

Figure 1: We conducted a user study to build a comprehensive VR interaction dataset, trained intent prediction models on the dataset, and deployed them with GazeIntent, our adaptive dwell time scaling gaze selection method.
Figure 2: Intent modeling training process and GazeIntent system architecture for scaling dwell time. Raw GIW data is processed to create a dataset and train a temporal intent model. The intent model is deployed by maintaining sliding windows of the last four predictions to compute a Scaling Factor $S_f$ (Eq.\ref{['eq:scalingfactor']}). The $S_f$ is then used to dynamically adjust the dwell-time threshold for selection.
Figure 3: Qualitative Measures: Box plot distributions of subjective speed perceptions across tasks. Users indicated faster system selection with GazeIntent-Personal and GazeIntent-General. Significantly different groups are marked with *** as p < .001.
Figure 4: Session-2 User-Rankings, Returning Users (a,b,c) evaluated using four methods and New Users (d,e,f) evaluated using three methods. (a, d): Fitt's Law-like task, (b, e): Arithmetic Task, and (c, f): Sliding Puzzle Task. The rankings indicate that returning users preferred personalized models for Fitt's Law-style and Arithmetic tasks and the Static method was preferred for Sliding Puzzle. New users preferred the generalized GazeIntent method only for the Arithmetic task that it was trained on, and Static baseline otherwise.
Figure 5: The mean and standard deviation of F1 score for the ten optimal model configurations tested against the entire test dataset. The models are trained on successive temporal splits of training data sessions. Splits 1, 2, 3, 4, and 5 are defined as the models trained sequentially on the first 20%, 40%, 60%, 80%, and 100% of the training data.

GazeIntent: Adapting dwell-time selection in VR interaction with real-time intent modeling

TL;DR

Abstract

GazeIntent: Adapting dwell-time selection in VR interaction with real-time intent modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (5)