Quantifying the Impact of Motion on 2D Gaze Estimation in Real-World Mobile Interactions
Yaxiong Lei, Yuheng Wang, Fergus Buchanan, Mingyue Zhao, Yusuke Sugano, Shijing He, Mohamed Khamis, Juan Ye
TL;DR
This work addresses the challenge of accurate 2D gaze estimation on handheld devices under real-world motion. By coupling a mobile gaze system with synchronized IMU and vision data, two user studies quantify how head distance, head pose, and device orientation drive gaze errors, with static conditions yielding notably lower RMSE than dynamic ones. A key contribution is the use of Lasso regression to identify dominant factors and demonstrate that head distance and orientation account for the majority of error variance, motivating adaptive, motion-aware calibration and potential 2D-3D hybrid approaches. The findings have practical implications for robust mobile eye-tracking systems, suggesting that frequent or context-aware recalibration and the integration of orientation cues are necessary to maintain accuracy across diverse mobile contexts.
Abstract
Mobile gaze tracking involves inferring a user's gaze point or direction on a mobile device's screen from facial images captured by the device's front camera. While this technology inspires an increasing number of gaze-interaction applications, achieving consistent accuracy remains challenging due to dynamic user-device spatial relationships and varied motion conditions inherent in mobile contexts. This paper provides empirical evidence on how user mobility and behaviour affect mobile gaze tracking accuracy. We conduct two user studies collecting behaviour and gaze data under various motion conditions - from lying to maze navigation - and during different interaction tasks. Quantitative analysis has revealed behavioural regularities among daily tasks and identified head distance, head pose, and device orientation as key factors affecting accuracy, with errors increasing by up to 48.91% in dynamic conditions compared to static ones. These findings highlight the need for more robust, adaptive eye-tracking systems that account for head movements and device deflection to maintain accuracy across diverse mobile contexts.
