ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation
Nora Hollenstein, Marius Troendle, Ce Zhang, Nicolas Langer
TL;DR
ZuCo 2.0 provides the first freely available corpus of simultaneous eye-tracking and EEG data collected during natural reading and an annotation task, enabling direct comparisons of cognitive processing across reading modes. It comprises data from 18 participants reading 739 English Wikipedia sentences under normal reading and task-specific annotation conditions, with comprehensive preprocessing and feature extraction pipelines. The work validates data quality through FRP and eye-tracking analyses and discusses broad reuse potential for NLP evaluation and brain-informed language modeling. This resource supports research on relation extraction, cognitive load during reading, and the neural correlates of textual understanding in realistic settings.
Abstract
We recorded and preprocessed ZuCo 2.0, a new dataset of simultaneous eye-tracking and electroencephalography during natural reading and during annotation. This corpus contains gaze and brain activity data of 739 sentences, 349 in a normal reading paradigm and 390 in a task-specific paradigm, in which the 18 participants actively search for a semantic relation type in the given sentences as a linguistic annotation task. This new dataset complements ZuCo 1.0 by providing experiments designed to analyze the differences in cognitive processing between natural reading and annotation. The data is freely available here: https://osf.io/2urht/.
