GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering
Farhaan Ebadulla, Chiraag Mudlapur, Gaurav BV
TL;DR
GazeProphet presents a software-only solution for predicting gaze in VR to enable foveated rendering without eye-tracking hardware. It fuses spatial features from a Spherical Vision Transformer with temporal gaze dynamics via an LSTM temporal encoder, using a multi-modal fusion network to output gaze coordinates and a confidence score. The approach achieves a median angular error of $3.83^\circ$, outperforming saliency baselines by $24\%$, and demonstrates robust performance across regions and scenes with strong statistical significance. This work lowers barriers to widespread foveated rendering, enabling VR optimization on existing hardware and broadening accessibility across platforms.
Abstract
Foveated rendering significantly reduces computational demands in virtual reality applications by concentrating rendering quality where users focus their gaze. Current approaches require expensive hardware-based eye tracking systems, limiting widespread adoption due to cost, calibration complexity, and hardware compatibility constraints. This paper presents GazeProphet, a software-only approach for predicting gaze locations in VR environments without requiring dedicated eye tracking hardware. The approach combines a Spherical Vision Transformer for processing 360-degree VR scenes with an LSTM-based temporal encoder that captures gaze sequence patterns. A multi-modal fusion network integrates spatial scene features with temporal gaze dynamics to predict future gaze locations with associated confidence estimates. Experimental evaluation on a comprehensive VR dataset demonstrates that GazeProphet achieves a median angular error of 3.83 degrees, outperforming traditional saliency-based baselines by 24% while providing reliable confidence calibration. The approach maintains consistent performance across different spatial regions and scene types, enabling practical deployment in VR systems without additional hardware requirements. Statistical analysis confirms the significance of improvements across all evaluation metrics. These results show that software-only gaze prediction can work for VR foveated rendering, making this performance boost more accessible to different VR platforms and apps.
