Supporting Mitosis Detection AI Training with Inter-Observer Eye-Gaze Consistencies
Hongyan Gu, Zihan Yan, Ayesha Alvi, Brandon Day, Chunxu Yang, Zida Wu, Shino Magaki, Mohammad Haeri, Xiang 'Anthony' Chen
TL;DR
This work proposes using inter-observer eye-gaze consistency as a cost-effective source of training labels for mitosis detection in pathology. By aggregating fixations across groups of participants and extracting centroids from heatmap hotspots, the authors generate eye-gaze labels that guide CNN training (EfficientNet-$b3$) through a two-iteration active-learning and GradCAM++-based localization pipeline. Compared to a heuristic color-based labeling and to ground-truth annotations, CNNs trained on eye-gaze labels closely approach ground-truth performance and significantly outperform the heuristic baseline, with a notable improvement in precision as the group size increases to $k=14$. The study demonstrates a practical, non-disruptive data collection approach that could generalize to other medical imaging tasks, albeit with a remaining recall gap relative to expert-labeled data and the need for validation with pathologist participants.
Abstract
The expansion of artificial intelligence (AI) in pathology tasks has intensified the demand for doctors' annotations in AI development. However, collecting high-quality annotations from doctors is costly and time-consuming, creating a bottleneck in AI progress. This study investigates eye-tracking as a cost-effective technology to collect doctors' behavioral data for AI training with a focus on the pathology task of mitosis detection. One major challenge in using eye-gaze data is the low signal-to-noise ratio, which hinders the extraction of meaningful information. We tackled this by levering the properties of inter-observer eye-gaze consistencies and creating eye-gaze labels from consistent eye-fixations shared by a group of observers. Our study involved 14 non-medical participants, from whom we collected eye-gaze data and generated eye-gaze labels based on varying group sizes. We assessed the efficacy of such eye-gaze labels by training Convolutional Neural Networks (CNNs) and comparing their performance to those trained with ground truth annotations and a heuristic-based baseline. Results indicated that CNNs trained with our eye-gaze labels closely followed the performance of ground-truth-based CNNs, and significantly outperformed the baseline. Although primarily focused on mitosis, we envision that insights from this study can be generalized to other medical imaging tasks.
