Table of Contents
Fetching ...

TimeGazer: Temporal Modeling of Predictive Gaze Stabilization for AR Interaction

Yaozheng Xia, Zaiping Zhu, Bo Pang, Shaorong Wang, Sheng Li

TL;DR

TimeGazer addresses unstable fixation in AR by reframing gaze stabilization as a seq2seq temporal regression that predicts future idealized fixation trajectories of length $ au$ from past gaze during the search phase. It introduces synthetic data augmentation to enrich training, constructs a dedicated TimeGazer-dataset from 54 HoloLens 2 users, and deploys a TimesNet-based architecture with a dual projection module to predict cohesive gaze sequences in real time. The study demonstrates that TimeGazer improves interaction accuracy and reduces task duration across AR tasks and MRTK scenarios, with significant gains over native gaze tracking and strong ablation results supporting the architectural choices. This temporal modeling approach offers a scalable path toward adaptive, gaze-driven interfaces in immersive systems, enabling more reliable object selection, navigation, and attention-aware interaction in AR.

Abstract

Gaze stabilization is critical for enabling fluid, accurate, and efficient interaction in immersive augmented reality (AR) environments, particularly during task-oriented visual behaviors. However, fixation sequences captured in active gaze tasks often exhibit irregular dispersion and systematic deviations from target locations, a variability primarily caused by the combined effects of human oculomotor physiology, insufficient AR headset tracking and calibration accuracy, and environmental disturbances, undermining interaction performance and visual engagement. To address this issue, we propose TimeGazer, which reformulates gaze stabilization as a sequence-to-sequence temporal regression problem, predicting idealized fixation trajectories for the target-fixation phase from historical gaze dynamics in the search phase. We present a synthetic data generation and blending strategy that produces spatially concentrated, target-centered fixation references aligned with task objectives, substantially enriching the training space and enhancing model generalization. We train and evaluate TimeGazer on a hybrid dataset of real and augmented gaze sequences collected via Microsoft HoloLens 2 from 54 participants across multiple prediction horizons. Through the user study, statistical results demonstrate that TimeGazer significantly improves interaction accuracy and reduces completion time, confirming that temporal modeling of predictive gaze stabilization can strengthen attentional consistency and responsiveness in task-driven AR interaction. These findings highlight the broader potential of TimeGazer for advancing adaptive gaze-based interfaces and temporal modeling research in immersive systems.

TimeGazer: Temporal Modeling of Predictive Gaze Stabilization for AR Interaction

TL;DR

TimeGazer addresses unstable fixation in AR by reframing gaze stabilization as a seq2seq temporal regression that predicts future idealized fixation trajectories of length from past gaze during the search phase. It introduces synthetic data augmentation to enrich training, constructs a dedicated TimeGazer-dataset from 54 HoloLens 2 users, and deploys a TimesNet-based architecture with a dual projection module to predict cohesive gaze sequences in real time. The study demonstrates that TimeGazer improves interaction accuracy and reduces task duration across AR tasks and MRTK scenarios, with significant gains over native gaze tracking and strong ablation results supporting the architectural choices. This temporal modeling approach offers a scalable path toward adaptive, gaze-driven interfaces in immersive systems, enabling more reliable object selection, navigation, and attention-aware interaction in AR.

Abstract

Gaze stabilization is critical for enabling fluid, accurate, and efficient interaction in immersive augmented reality (AR) environments, particularly during task-oriented visual behaviors. However, fixation sequences captured in active gaze tasks often exhibit irregular dispersion and systematic deviations from target locations, a variability primarily caused by the combined effects of human oculomotor physiology, insufficient AR headset tracking and calibration accuracy, and environmental disturbances, undermining interaction performance and visual engagement. To address this issue, we propose TimeGazer, which reformulates gaze stabilization as a sequence-to-sequence temporal regression problem, predicting idealized fixation trajectories for the target-fixation phase from historical gaze dynamics in the search phase. We present a synthetic data generation and blending strategy that produces spatially concentrated, target-centered fixation references aligned with task objectives, substantially enriching the training space and enhancing model generalization. We train and evaluate TimeGazer on a hybrid dataset of real and augmented gaze sequences collected via Microsoft HoloLens 2 from 54 participants across multiple prediction horizons. Through the user study, statistical results demonstrate that TimeGazer significantly improves interaction accuracy and reduces completion time, confirming that temporal modeling of predictive gaze stabilization can strengthen attentional consistency and responsiveness in task-driven AR interaction. These findings highlight the broader potential of TimeGazer for advancing adaptive gaze-based interfaces and temporal modeling research in immersive systems.

Paper Structure

This paper contains 34 sections, 18 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Real-world background and visual elements in the visual environments.
  • Figure 2: Experiment settings.
  • Figure 3: A typical gaze trajectory example.
  • Figure 4: Architecture of our TimeGazer model. Raw historical gaze sequences are first encoded via an embedding module, followed by temporal feature extraction and sequence-to-sequence modeling using TimesNet. The projected module then reduces the temporal representation to generate the final predicted gaze sequence.
  • Figure 5: Sliding window training strategy.
  • ...and 3 more figures