Table of Contents
Fetching ...

Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models

Jiexin Ding, Bowen Zhao, Yuntao Wang, Xinyun Liu, Rui Hao, Ishan Chatterjee, Yuanchun Shi

TL;DR

EyeLingo addresses the challenge of unknown word detection for ESL readers by jointly leveraging gaze trajectories and pre-trained language models within a transformer framework. The system locates a region of interest from gaze data and fuses RoBERTa-based contextual text representations with word-level knowledge to predict unknown words in real time, trained with focal loss to handle class imbalance. Empirical results show high accuracy (up to 97.6%) and strong F1 scores (71.1%) on professional eye-tracker data, with robust performance on webcam data (F1 ~65%), and clear evidence that PLM features drive performance while gaze contributes personalized cues. A real-time reading assistance prototype demonstrates practical benefits, including faster reading and higher willingness to use versus traditional lookup, with latency well within real-time constraints, indicating strong potential for deployment on consumer devices and for vocabulary learning support.

Abstract

English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.

Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models

TL;DR

EyeLingo addresses the challenge of unknown word detection for ESL readers by jointly leveraging gaze trajectories and pre-trained language models within a transformer framework. The system locates a region of interest from gaze data and fuses RoBERTa-based contextual text representations with word-level knowledge to predict unknown words in real time, trained with focal loss to handle class imbalance. Empirical results show high accuracy (up to 97.6%) and strong F1 scores (71.1%) on professional eye-tracker data, with robust performance on webcam data (F1 ~65%), and clear evidence that PLM features drive performance while gaze contributes personalized cues. A real-time reading assistance prototype demonstrates practical benefits, including faster reading and higher willingness to use versus traditional lookup, with latency well within real-time constraints, indicating strong potential for deployment on consumer devices and for vocabulary learning support.

Abstract

English as a Second Language (ESL) learners often encounter unknown words that hinder their text comprehension. Automatically detecting these words as users read can enable computing systems to provide just-in-time definitions, synonyms, or contextual explanations, thereby helping users learn vocabulary in a natural and seamless manner. This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy. A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%. We implemented a real-time reading assistance prototype to show the effectiveness of EyeLingo. The user study shows improvement in willingness to use and usefulness compared to baseline methods.

Paper Structure

This paper contains 42 sections, 4 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Our method locates the content the user is reading in real-time through gaze, and inputs the gaze data and text data to the transform-based model to detect unknown words.
  • Figure 2: Methodology for Detecting Unknown Words Using Integrated Text and Gaze Information. This integrated approach leverages both linguistic and user-dependent factors to effectively identify unknown words.
  • Figure 3: Our model includes an encoder-decoder model to encode positional data, a pre-trained RoBERTa to encode text information, and learnable embeddings to encode the knowledge. The concatenation of these three matrices is input to a binary classifier.
  • Figure 4: (A) During data collection, users are seated at a distance of approximately 50-60 cm from the screen, as instructed by the Tobii calibration interface. The eye tracker and webcam simultaneously collect data. (B) The data collection platform includes a Python script to read eye tracker data and a web-based PDF reader to read webcam data and record word labeling.
  • Figure 5: A bounding box is derived from the gaze coordination within a 1-second sliding window. The gaze data, token-level text data, and word-level knowledge data are calculated for each candidate word in the bounding box.
  • ...and 7 more figures