Table of Contents
Fetching ...

Eye-Tracking, Mouse Tracking, Stimulus Tracking,and Decision-Making Datasets in Digital Pathology

Veronica Thai, Rui Li, Meng Ling, Shuning Jiang, Jeremy Wolfe, Raghu Machiraju, Yan Hu, Zaibo Li, Anil Parwani, Jian Chen

TL;DR

A comprehensive behavioral dataset capturing the dynamic visual search and decision-making processes of the full diagnostic workflow during cancer diagnosis, and could be used to improve the training of both pathologists and AI systems that might support human experts.

Abstract

Interpretation of giga-pixel whole-slide images (WSIs) is an important but difficult task for pathologists. Their diagnostic accuracy is estimated to average around 70%. Adding a second pathologist does not substantially improve decision consistency. The field lacks adequate behavioral data to explain diagnostic errors and inconsistencies. To fill in this gap, we present PathoGaze1.0, a comprehensive behavioral dataset capturing the dynamic visual search and decision-making processes of the full diagnostic workflow during cancer diagnosis. The dataset comprises 18.69 hours of eye-tracking, mouse interaction, stimulus tracking, viewport navigation, and diagnostic decision data (EMSVD) collected from 19 pathologists interpreting 397 WSIs. The data collection process emphasizes ecological validity through an application-grounded testbed, called PTAH. In total, we recorded 171,909 fixations, 263,320 saccades, and 1,867,362 mouse interaction events. In addition, such data could also be used to improve the training of both pathologists and AI systems that might support human experts. All experiments were preregistered at https://osf.io/hj9a7, and the complete dataset along with analysis code is available at https://go.osu.edu/pathogaze.

Eye-Tracking, Mouse Tracking, Stimulus Tracking,and Decision-Making Datasets in Digital Pathology

TL;DR

A comprehensive behavioral dataset capturing the dynamic visual search and decision-making processes of the full diagnostic workflow during cancer diagnosis, and could be used to improve the training of both pathologists and AI systems that might support human experts.

Abstract

Interpretation of giga-pixel whole-slide images (WSIs) is an important but difficult task for pathologists. Their diagnostic accuracy is estimated to average around 70%. Adding a second pathologist does not substantially improve decision consistency. The field lacks adequate behavioral data to explain diagnostic errors and inconsistencies. To fill in this gap, we present PathoGaze1.0, a comprehensive behavioral dataset capturing the dynamic visual search and decision-making processes of the full diagnostic workflow during cancer diagnosis. The dataset comprises 18.69 hours of eye-tracking, mouse interaction, stimulus tracking, viewport navigation, and diagnostic decision data (EMSVD) collected from 19 pathologists interpreting 397 WSIs. The data collection process emphasizes ecological validity through an application-grounded testbed, called PTAH. In total, we recorded 171,909 fixations, 263,320 saccades, and 1,867,362 mouse interaction events. In addition, such data could also be used to improve the training of both pathologists and AI systems that might support human experts. All experiments were preregistered at https://osf.io/hj9a7, and the complete dataset along with analysis code is available at https://go.osu.edu/pathogaze.

Paper Structure

This paper contains 34 sections, 2 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Examples of gaze behaviors from pathologists in PathoGaze1.0. Each whole-slide image (WSI) contains approximately one billion pixels (gigapixel scale). The axis tick labels are scaled down by a factor of $32\times32$ relative to the original image size. Dots represent fixation points, with color and size indicating fixation duration, while connecting lines denote saccades (scan paths). Both fixations and saccades are overlaid on the corresponding original WSIs. Observations. The behavior data captured distinct viewing strategies. For example, for cases with large tumor regions (top row), Participant P5 made a diagnostic decision without scanning all tumor areas, whereas Participant P10 examined most regions before responding. In challenging small-tumor cases, one participant correctly identified a tumor in one slide but misclassified another. Participant P8 exhibited a search error by failing to fixate on the tumor regions (highlighted in red). In general, we observe mouse movement (blue lines and arrows) aligns with gaze points, such as the examples from Participant P10. However, there are a few cases, such as Participant P5, where the mouse does not follow the gaze, in this case, spanning a wider area than the gaze.
  • Figure 2: Viewport and screen coordinates. All data are calibrated in the WSI image coordinates, where the upper-left corner of the WSI is (0, 0) and bottom right corner is largest ($x_{image}$, $y_{image}$) of the WSI in its full pixel resolution. A viewport is the rectangular area on the screen where the WSI is actually rendered and displayed. It excludes surrounding user interface elements. Each viewport was recorded as its corresponding position on the full WSI. Specifically, we stored the pixel coordinates of the viewport’s upper-left corner in the WSI coordinate system, as illustrated in the navigator overview. Fixations were captured in the screen coordinates and were subsequently transformed to the WSI image coordinates.
  • Figure 3: Data directory structure. Shared resources such as common processing scripts and image metric data are placed at the root level. Experiment specific data are organized into subdirectories named after the first experiment, P10S60T600, and the second, P9D397T540. Both subdirectories contain the same type of data with the same organization: experiment specific code, raw testbed and eye-tracker data, screen recordings, and processed data. Publicly available via https://go.osu.edu/pathoems.
  • Figure 4:
  • Figure 8: Example 3D scanpaths behaviors in orthogonal views. The top-down, front-left, front-right, and 3D view of a participant's behaviors. The scanpath is colored by the relative timestamp and the ground truth tumor region is marked in green. Observations. This participant started at the large tissue area at the top of the slide, then viewed the areas below at low magnification. On the right region, they zoomed in to view that area in higher detail before zooming back out. When they viewed the tumor area, they zoomed in again before finishing the trial.
  • ...and 6 more figures