GazeProphetV2: Head-Movement-Based Gaze Prediction Enabling Efficient Foveated Rendering on Mobile VR

Farhaan Ebadulla; Chiraag Mudlpaur; Shreya Chaurasia; Gaurav BV

GazeProphetV2: Head-Movement-Based Gaze Prediction Enabling Efficient Foveated Rendering on Mobile VR

Farhaan Ebadulla, Chiraag Mudlpaur, Shreya Chaurasia, Gaurav BV

TL;DR

This work tackles the challenge of predicting user gaze in VR to enable efficient foveated rendering without relying on expensive eye-tracking hardware. A multimodal architecture combines temporal gaze history, head orientation, and scene content through a CNN-based scene encoder, dual LSTMs, and a three-way gated fusion, with multi-step autoregressive prediction and auxiliary losses for robust learning. The approach achieves strong cross-scene generalization (about $93.1\%$ accuracy for 1–3 frames ahead on 22 scenes) and maintains real-time performance (~88 FPS, ~4.21 ms latency), with a user study indicating preserved perceptual quality and rendering savings. These results demonstrate practical potential for attention-aware VR rendering on mobile hardware and provide a foundation for expanding multimodal predictive cues in immersive environments.

Abstract

Predicting gaze behavior in virtual reality environments remains a significant challenge with implications for rendering optimization and interface design. This paper introduces a multimodal approach to VR gaze prediction that combines temporal gaze patterns, head movement data, and visual scene information. By leveraging a gated fusion mechanism with cross-modal attention, the approach learns to adaptively weight gaze history, head movement, and scene content based on contextual relevance. Evaluations using a dataset spanning 22 VR scenes with 5.3M gaze samples demonstrate improvements in predictive accuracy when combining modalities compared to using individual data streams alone. The results indicate that integrating past gaze trajectories with head orientation and scene content enhances prediction accuracy across 1-3 future frames. Cross-scene generalization testing shows consistent performance with 93.1% validation accuracy and temporal consistency in predicted gaze trajectories. These findings contribute to understanding attention mechanisms in virtual environments while suggesting potential applications in rendering optimization, interaction design, and user experience evaluation. The approach represents a step toward more efficient virtual reality systems that can anticipate user attention patterns without requiring expensive eye tracking hardware.

GazeProphetV2: Head-Movement-Based Gaze Prediction Enabling Efficient Foveated Rendering on Mobile VR

TL;DR

Abstract

GazeProphetV2: Head-Movement-Based Gaze Prediction Enabling Efficient Foveated Rendering on Mobile VR

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)