Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models
Jiexin Ding, Bowen Zhao, Yuntao Wang, Xinyun Liu, Rui Hao, Ishan Chatterjee, Yuanchun Shi
TL;DR
EyeLingo addresses the challenge of unknown word detection for ESL readers by jointly leveraging gaze trajectories and pre-trained language models within a transformer framework. The system locates a region of interest from gaze data and fuses RoBERTa-based contextual text representations with word-level knowledge to predict unknown words in real time, trained with focal loss to handle class imbalance. Empirical results show high accuracy (up to 97.6%) and strong F1 scores (71.1%) on professional eye-tracker data, with robust performance on webcam data (F1 ~65%), and clear evidence that PLM features drive performance while gaze contributes personalized cues. A real-time reading assistance prototype demonstrates practical benefits, including faster reading and higher willingness to use versus traditional lookup, with latency well within real-time constraints, indicating strong potential for deployment on consumer devices and for vocabulary learning support.
Abstract
English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.
