Crowdsourced human-based computational approach for tagging peripheral blood smear sample images from Sickle Cell Disease patients using non-expert users
José María Buades Rubio, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Nataša Petrović
TL;DR
The study investigates crowdsourcing PBS image labeling for SCD by engaging non-expert MTurk workers to categorize red blood cells into Circular, Elongated, or Other. A ground-truth-derived evaluation using the erythrocytesIDB dataset reveals that achieving strong consensus among workers dramatically reduces labeling error, enabling high-quality annotations suitable for training automated diagnostic tools. The approach demonstrates concrete metrics (confusion matrices, SDS-score, MCC, etc.) and shows notable gains when combining certain classes and leveraging consensus, while also highlighting cases requiring expert review. Overall, the work establishes a feasible framework for large-scale, crowd-sourced dataset annotation in hematology imaging and points to integration with automated methods and extension to other hemoglobinopathies.
Abstract
In this paper, we present a human-based computation approach for the analysis of peripheral blood smear (PBS) images images in patients with Sickle Cell Disease (SCD). We used the Mechanical Turk microtask market to crowdsource the labeling of PBS images. We then use the expert-tagged erythrocytesIDB dataset to assess the accuracy and reliability of our proposal. Our results showed that when a robust consensus is achieved among the Mechanical Turk workers, probability of error is very low, based on comparison with expert analysis. This suggests that our proposed approach can be used to annotate datasets of PBS images, which can then be used to train automated methods for the diagnosis of SCD. In future work, we plan to explore the potential integration of our findings with outcomes obtained through automated methodologies. This could lead to the development of more accurate and reliable methods for the diagnosis of SCD
