Dual input stream transformer for vertical drift correction in eye-tracking reading data

Thomas M. Mercier; Marcin Budka; Martin R. Vasilev; Julie A. Kirkby; Bernhard Angele; Timothy J. Slattery

Dual input stream transformer for vertical drift correction in eye-tracking reading data

Thomas M. Mercier, Marcin Budka, Martin R. Vasilev, Julie A. Kirkby, Bernhard Angele, Timothy J. Slattery

Abstract

We introduce a novel Dual Input Stream Transformer (DIST) for the challenging problem of assigning fixation points from eye-tracking data collected during passage reading to the line of text that the reader was actually focused on. This post-processing step is crucial for analysis of the reading data due to the presence of noise in the form of vertical drift. We evaluate DIST against eleven classical approaches on a comprehensive suite of nine diverse datasets. We demonstrate that combining multiple instances of the DIST model in an ensemble achieves high accuracy across all datasets. Further combining the DIST ensemble with the best classical approach yields an average accuracy of 98.17 %. Our approach presents a significant step towards addressing the bottleneck of manual line assignment in reading research. Through extensive analysis and ablation studies, we identify key factors that contribute to DIST's success, including the incorporation of line overlap features and the use of a second input stream. Via rigorous evaluation, we demonstrate that DIST is robust to various experimental setups, making it a safe first choice for practitioners in the field.

Dual input stream transformer for vertical drift correction in eye-tracking reading data

Abstract

Paper Structure (9 sections, 3 equations, 12 figures, 2 tables)

This paper contains 9 sections, 3 equations, 12 figures, 2 tables.

Introduction
Related work
Datasets
Framework
Implementation
Training Setting and Evaluation Metric
Results and Discussion
Ablation Studies
Conclusions

Figures (12)

Figure 1: Fixation density plot illustrating the distribution of the fixation point coordinates for all datasets before and after normalizing the fixation points by subtracting the minimum character bounding box coordinates in each trial (xy-norm) and dividing by the line width and line height (lw-norm). Darker blue colors indicate a higher concentration of fixation points across the trials. The x- and y-axes in Subfigures a) and b) give the coordinates in pixels while c) is fully normalized and thus does not have any units. Please see supplementary information for an enlarged version of this figure.
Figure 2: Example of the single channel images used as second input stream to the main encoder.
Figure 3: Model flow with the top half showing the fixation information input stream and the bottom half showing the page information stream. s is the length that all sequences are padded to. f is the number of fixation related features. h is the hidden dimension of the main encoder network. l is the maximum number of lines in the datasets.
Figure 4: Cross-validation using E-WOC evaluated using average accuracies relative to the accuracy of the best classical algorithm for each dataset. The reported accuracy is based on allocating three votes to E-DIST in the voting pool.
Figure 5: Relative accuracy for E-WOC depending on the number of votes allocated to E-DIST model, which uses six DIST instances. To take into account the differences in achieved accuracy caused by which DIST instances are used in E-DIST, the data shown is the result of running the experiment 15 times for each dataset with the included instances being chosen at random (uniform probability) for each repetition.
...and 7 more figures

Dual input stream transformer for vertical drift correction in eye-tracking reading data

Abstract

Dual input stream transformer for vertical drift correction in eye-tracking reading data

Authors

Abstract

Table of Contents

Figures (12)