Table of Contents
Fetching ...

ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation

Nora Hollenstein, Marius Troendle, Ce Zhang, Nicolas Langer

TL;DR

ZuCo 2.0 provides the first freely available corpus of simultaneous eye-tracking and EEG data collected during natural reading and an annotation task, enabling direct comparisons of cognitive processing across reading modes. It comprises data from 18 participants reading 739 English Wikipedia sentences under normal reading and task-specific annotation conditions, with comprehensive preprocessing and feature extraction pipelines. The work validates data quality through FRP and eye-tracking analyses and discusses broad reuse potential for NLP evaluation and brain-informed language modeling. This resource supports research on relation extraction, cognitive load during reading, and the neural correlates of textual understanding in realistic settings.

Abstract

We recorded and preprocessed ZuCo 2.0, a new dataset of simultaneous eye-tracking and electroencephalography during natural reading and during annotation. This corpus contains gaze and brain activity data of 739 sentences, 349 in a normal reading paradigm and 390 in a task-specific paradigm, in which the 18 participants actively search for a semantic relation type in the given sentences as a linguistic annotation task. This new dataset complements ZuCo 1.0 by providing experiments designed to analyze the differences in cognitive processing between natural reading and annotation. The data is freely available here: https://osf.io/2urht/.

ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation

TL;DR

ZuCo 2.0 provides the first freely available corpus of simultaneous eye-tracking and EEG data collected during natural reading and an annotation task, enabling direct comparisons of cognitive processing across reading modes. It comprises data from 18 participants reading 739 English Wikipedia sentences under normal reading and task-specific annotation conditions, with comprehensive preprocessing and feature extraction pipelines. The work validates data quality through FRP and eye-tracking analyses and discusses broad reuse potential for NLP evaluation and brain-informed language modeling. This resource supports research on relation extraction, cognitive load during reading, and the neural correlates of textual understanding in realistic settings.

Abstract

We recorded and preprocessed ZuCo 2.0, a new dataset of simultaneous eye-tracking and electroencephalography during natural reading and during annotation. This corpus contains gaze and brain activity data of 739 sentences, 349 in a normal reading paradigm and 390 in a task-specific paradigm, in which the 18 participants actively search for a semantic relation type in the given sentences as a linguistic annotation task. This new dataset complements ZuCo 1.0 by providing experiments designed to analyze the differences in cognitive processing between natural reading and annotation. The data is freely available here: https://osf.io/2urht/.

Paper Structure

This paper contains 21 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Visualization of eye-tracking and EEG data for a single sentence. (a) Prototypical sentence fixation data. Red crosses indicate fixations; boxes around the words indicate the wordbounds. (b) Fixation data plotted over time. (c) Raw EEG data during a single sentence. (d) Same data as in (c) after preprocessing.
  • Figure 2: Example sentences on the recording screen: (left) a normal reading sentence, (middle) a control question for a normal reading sentence, and (right) a task-specific annotation sentence.
  • Figure 3: Sentence length (words per sentence), reading speed (seconds per sentence) and omission rate (percentage of words not fixated) comparison between normal reading (NR) and task-specific reading (TSR).
  • Figure 4: Skipping proportion on word level for both tasks.
  • Figure 5: Fixation heatmaps for two sentences containing the relation founder, showing a comparison of the eye-tracking features between normal reading and task-specific annotation reading (first fixation duration (FFD), total reading time (TRT), number of fixations (nFix).
  • ...and 4 more figures