Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models

Jiexin Ding; Bowen Zhao; Yuntao Wang; Xinyun Liu; Rui Hao; Ishan Chatterjee; Yuanchun Shi

Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models

Jiexin Ding, Bowen Zhao, Yuntao Wang, Xinyun Liu, Rui Hao, Ishan Chatterjee, Yuanchun Shi

TL;DR

EyeLingo addresses the challenge of unknown word detection for ESL readers by jointly leveraging gaze trajectories and pre-trained language models within a transformer framework. The system locates a region of interest from gaze data and fuses RoBERTa-based contextual text representations with word-level knowledge to predict unknown words in real time, trained with focal loss to handle class imbalance. Empirical results show high accuracy (up to 97.6%) and strong F1 scores (71.1%) on professional eye-tracker data, with robust performance on webcam data (F1 ~65%), and clear evidence that PLM features drive performance while gaze contributes personalized cues. A real-time reading assistance prototype demonstrates practical benefits, including faster reading and higher willingness to use versus traditional lookup, with latency well within real-time constraints, indicating strong potential for deployment on consumer devices and for vocabulary learning support.

Abstract

English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.

Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models

TL;DR

Abstract

Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)