Table of Contents
Fetching ...

Crowdsourced human-based computational approach for tagging peripheral blood smear sample images from Sickle Cell Disease patients using non-expert users

José María Buades Rubio, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Nataša Petrović

TL;DR

The study investigates crowdsourcing PBS image labeling for SCD by engaging non-expert MTurk workers to categorize red blood cells into Circular, Elongated, or Other. A ground-truth-derived evaluation using the erythrocytesIDB dataset reveals that achieving strong consensus among workers dramatically reduces labeling error, enabling high-quality annotations suitable for training automated diagnostic tools. The approach demonstrates concrete metrics (confusion matrices, SDS-score, MCC, etc.) and shows notable gains when combining certain classes and leveraging consensus, while also highlighting cases requiring expert review. Overall, the work establishes a feasible framework for large-scale, crowd-sourced dataset annotation in hematology imaging and points to integration with automated methods and extension to other hemoglobinopathies.

Abstract

In this paper, we present a human-based computation approach for the analysis of peripheral blood smear (PBS) images images in patients with Sickle Cell Disease (SCD). We used the Mechanical Turk microtask market to crowdsource the labeling of PBS images. We then use the expert-tagged erythrocytesIDB dataset to assess the accuracy and reliability of our proposal. Our results showed that when a robust consensus is achieved among the Mechanical Turk workers, probability of error is very low, based on comparison with expert analysis. This suggests that our proposed approach can be used to annotate datasets of PBS images, which can then be used to train automated methods for the diagnosis of SCD. In future work, we plan to explore the potential integration of our findings with outcomes obtained through automated methodologies. This could lead to the development of more accurate and reliable methods for the diagnosis of SCD

Crowdsourced human-based computational approach for tagging peripheral blood smear sample images from Sickle Cell Disease patients using non-expert users

TL;DR

The study investigates crowdsourcing PBS image labeling for SCD by engaging non-expert MTurk workers to categorize red blood cells into Circular, Elongated, or Other. A ground-truth-derived evaluation using the erythrocytesIDB dataset reveals that achieving strong consensus among workers dramatically reduces labeling error, enabling high-quality annotations suitable for training automated diagnostic tools. The approach demonstrates concrete metrics (confusion matrices, SDS-score, MCC, etc.) and shows notable gains when combining certain classes and leveraging consensus, while also highlighting cases requiring expert review. Overall, the work establishes a feasible framework for large-scale, crowd-sourced dataset annotation in hematology imaging and points to integration with automated methods and extension to other hemoglobinopathies.

Abstract

In this paper, we present a human-based computation approach for the analysis of peripheral blood smear (PBS) images images in patients with Sickle Cell Disease (SCD). We used the Mechanical Turk microtask market to crowdsource the labeling of PBS images. We then use the expert-tagged erythrocytesIDB dataset to assess the accuracy and reliability of our proposal. Our results showed that when a robust consensus is achieved among the Mechanical Turk workers, probability of error is very low, based on comparison with expert analysis. This suggests that our proposed approach can be used to annotate datasets of PBS images, which can then be used to train automated methods for the diagnosis of SCD. In future work, we plan to explore the potential integration of our findings with outcomes obtained through automated methodologies. This could lead to the development of more accurate and reliable methods for the diagnosis of SCD
Paper Structure (9 sections, 1 equation, 4 figures, 5 tables)

This paper contains 9 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Instructions for cell classification.
  • Figure 2: Cell classification task.
  • Figure 3: MTurk miss-classifications. The top row shows circular cells, the middle row shows elongated cells, and the bottom row shows other cell types. Each label shows the class that the MTurkers have classified them, the numbers in parenthesis show the votes: circular, elongated and other. These miss-classifications are indicative of the difficulty of accurately classifying cells.
  • Figure 4: Ratio of cells correctly classified regarding to the number of cells classified. We can observe that this calculation can be approximated through a linear regression. The classification ratio is maintained independently of the number of classified cells.