Table of Contents
Fetching ...

WebQAmGaze: A Multilingual Webcam Eye-Tracking-While-Reading Dataset

Tiago Ribeiro, Stephanie Brandl, Anders Søgaard, Nora Hollenstein

TL;DR

WebQAmGaze introduces a multilingual, low-cost webcam eye-tracking-while-reading dataset collected from 600 participants across English, German, Spanish, and Turkish. It compares webcam gaze with high-quality lab data and demonstrates meaningful correlations in reading behavior, including longer fixations on longer words and differences between normal and information-seeking reading. The work shows that gaze metrics, especially when aligned with task-relevant spans, can predict comprehension-question responses and support explainable AI rationales, using machine learning classifiers to distinguish correctness. The dataset and processing pipeline address scalability and ecological validity, though they acknowledge data-quality challenges and outline concrete future improvements for fixation methods and integration with NLP models. The contribution offers a foundation for leveraging low-cost gaze data to study reading and improve cognitively informed NLP systems.

Abstract

We present WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed as the first webcam-based eye-tracking corpus of reading to support the development of explainable computational language processing models. WebQAmGaze includes webcam eye-tracking data from 600 participants of a wide age range naturally reading English, German, Spanish, and Turkish texts. Each participant performs two reading tasks composed of five texts each, a normal reading and an information-seeking task, followed by a comprehension question. We compare the collected webcam data to high-quality eye-tracking recordings. The results show a moderate to strong correlation between the eye movement measures obtained with the webcam compared to those obtained with a commercial eye-tracking device. When validating the data, we find that higher fixation duration on relevant text spans accurately indicates correctness when answering the corresponding questions. This dataset advances webcam-based reading studies and opens avenues to low-cost and diverse data collection. WebQAmGaze is beneficial to learn about the cognitive processes behind question-answering and to apply these insights to computational models of language understanding.

WebQAmGaze: A Multilingual Webcam Eye-Tracking-While-Reading Dataset

TL;DR

WebQAmGaze introduces a multilingual, low-cost webcam eye-tracking-while-reading dataset collected from 600 participants across English, German, Spanish, and Turkish. It compares webcam gaze with high-quality lab data and demonstrates meaningful correlations in reading behavior, including longer fixations on longer words and differences between normal and information-seeking reading. The work shows that gaze metrics, especially when aligned with task-relevant spans, can predict comprehension-question responses and support explainable AI rationales, using machine learning classifiers to distinguish correctness. The dataset and processing pipeline address scalability and ecological validity, though they acknowledge data-quality challenges and outline concrete future improvements for fixation methods and integration with NLP models. The contribution offers a foundation for leveraging low-cost gaze data to study reading and improve cognitively informed NLP systems.

Abstract

We present WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed as the first webcam-based eye-tracking corpus of reading to support the development of explainable computational language processing models. WebQAmGaze includes webcam eye-tracking data from 600 participants of a wide age range naturally reading English, German, Spanish, and Turkish texts. Each participant performs two reading tasks composed of five texts each, a normal reading and an information-seeking task, followed by a comprehension question. We compare the collected webcam data to high-quality eye-tracking recordings. The results show a moderate to strong correlation between the eye movement measures obtained with the webcam compared to those obtained with a commercial eye-tracking device. When validating the data, we find that higher fixation duration on relevant text spans accurately indicates correctness when answering the corresponding questions. This dataset advances webcam-based reading studies and opens avenues to low-cost and diverse data collection. WebQAmGaze is beneficial to learn about the cognitive processes behind question-answering and to apply these insights to computational models of language understanding.
Paper Structure (38 sections, 3 equations, 17 figures, 12 tables)

This paper contains 38 sections, 3 equations, 17 figures, 12 tables.

Figures (17)

  • Figure 1: Experiment structure for the WebQAmGaze data collection. Light blue boxes represent reading pages, purple indicates input from the participants, orange indicates WebGazer calibration and validation steps, and white boxes represent the screens with fixation crosses. Every second trial there is a quick calibration step, indicated by the yellow and orange boxes within the two reading tasks, i.e., there is a calibration after the 2nd and 4th trial in the NR task and the 1st and 3rd trial in the IS task.
  • Figure 2: Example of the eye-tracking data and AOIs generated and data collected for XQuAD text NikolaTesla, 5th paragraph. The blue dots show individual fixations, their size increases with longer fixation duration, and the arrows show the saccades and their direction between the fixations. The orange box is the AOI for the paragraph. In green, red, and purple are the target AOI passages to answer the three corresponding questions from the XQuAD dataset. Lastly, the grey boxes show the AOI for each word in the text. For this specific set, (mturk_EN_v10), the question corresponds to the green box: "What article was published in 1937?".
  • Figure 3: Example of the eye-tracking data and AOIs generated and data collected for MECO text #12. The blue dots show individual fixations, their size increases with longer fixation duration, and the arrows show the saccades and their direction between the fixations. The orange box is the AOI for the paragraph. In green, red, purple and brown are the target AOI passages to answer the four corresponding True/False statements from the MECO dataset. Lastly, the grey boxes show the AOI for each word in the text. For this specific set, (mturk_EN_v10), the statement corresponds to the red box: "The size of the plates was standardized before World War II.".
  • Figure 4: WebGazer calibration accuracy (in %) for all participant populations ($n=600$; 350 mTurk, 240 volunteer, 10 lab). Medians are displayed as straight lines, means are shown as white dots.
  • Figure 5: Participants' age and WebGazer sampling frequency distribution. Bars in lighter colors show the full data before filtering ($n=600$), darker bars show the filtered data ($n=353$).
  • ...and 12 more figures