Fine-Grained Prediction of Reading Comprehension from Eye Movements
Omer Shubi, Yoav Meiri, Cfir Avraham Hadar, Yevgeni Berzak
TL;DR
This work investigates whether eye movements can enable fine-grained prediction of reading comprehension at the level of a single question over a paragraph. It introduces the large-scale OneStop Eye Movements dataset and three multimodal transformer models (RoBERTa-QEye, MAG-QEye, PostFusion-QEye) that fuse gaze data with text, evaluated under ordinary reading and information-seeking regimes. Across extensive cross-validation with strict generalization tests, eye movements provide informative but modest gains over strong text-only baselines, with gains varying by regime and task. The findings highlight both the potential and limits of gaze-based comprehension assessment, emphasizing the need for more data, robust multimodal architectures, and careful baseline comparisons for reliable deployment.
Abstract
Can human reading comprehension be assessed from eye movements in reading? In this work, we address this longstanding question using large-scale eyetracking data over textual materials that are geared towards behavioral analyses of reading comprehension. We focus on a fine-grained and largely unaddressed task of predicting reading comprehension from eye movements at the level of a single question over a passage. We tackle this task using three new multimodal language models, as well as a battery of prior models from the literature. We evaluate the models' ability to generalize to new textual items, new participants, and the combination of both, in two different reading regimes, ordinary reading and information seeking. The evaluations suggest that although the task is highly challenging, eye movements contain useful signals for fine-grained prediction of reading comprehension. Code and data will be made publicly available.
