Table of Contents
Fetching ...

Generalizable Blood Cell Detection via Unified Dataset and Faster R-CNN

Siddharth Sahay

TL;DR

The paper addresses automated peripheral blood cell detection amid heterogeneous data sources by constructing a unified multi-source dataset and evaluating Faster R-CNN with a ResNet-50-FPN backbone. It demonstrates that transfer learning from COCO accelerates convergence and improves most detection metrics compared to random initialization. Per-class analysis reveals strong performance on common cell types but critical data scarcity for rare classes, underscoring the need for targeted data augmentation and synthetic data or few-shot approaches. The study provides a robust, generalizable pipeline for deployable hematology diagnostics, with implications for scalable automated screening in clinical settings.

Abstract

This paper presents a comprehensive methodology and comparative performance analysis for the automated classification and object detection of peripheral blood cells (PBCs) in microscopic images. Addressing the critical challenge of data scarcity and heterogeneity, robust data pipeline was first developed to standardize and merge four public datasets (PBC, BCCD, Chula, Sickle Cell) into a unified resource. Then employed a state-of-the-art Faster R-CNN object detection framework, leveraging a ResNet-50-FPN backbone. Comparative training rigorously evaluated a randomly initialized baseline model (Regimen 1) against a Transfer Learning Regimen (Regimen 2), initialized with weights pre-trained on the Microsoft COCO dataset. The results demonstrate that the Transfer Learning approach achieved significantly faster convergence and superior stability, culminating in a final validation loss of 0.08666, a substantial improvement over the baseline. This validated methodology establishes a robust foundation for building high-accuracy, deployable systems for automated hematological diagnosis.

Generalizable Blood Cell Detection via Unified Dataset and Faster R-CNN

TL;DR

The paper addresses automated peripheral blood cell detection amid heterogeneous data sources by constructing a unified multi-source dataset and evaluating Faster R-CNN with a ResNet-50-FPN backbone. It demonstrates that transfer learning from COCO accelerates convergence and improves most detection metrics compared to random initialization. Per-class analysis reveals strong performance on common cell types but critical data scarcity for rare classes, underscoring the need for targeted data augmentation and synthetic data or few-shot approaches. The study provides a robust, generalizable pipeline for deployable hematology diagnostics, with implications for scalable automated screening in clinical settings.

Abstract

This paper presents a comprehensive methodology and comparative performance analysis for the automated classification and object detection of peripheral blood cells (PBCs) in microscopic images. Addressing the critical challenge of data scarcity and heterogeneity, robust data pipeline was first developed to standardize and merge four public datasets (PBC, BCCD, Chula, Sickle Cell) into a unified resource. Then employed a state-of-the-art Faster R-CNN object detection framework, leveraging a ResNet-50-FPN backbone. Comparative training rigorously evaluated a randomly initialized baseline model (Regimen 1) against a Transfer Learning Regimen (Regimen 2), initialized with weights pre-trained on the Microsoft COCO dataset. The results demonstrate that the Transfer Learning approach achieved significantly faster convergence and superior stability, culminating in a final validation loss of 0.08666, a substantial improvement over the baseline. This validated methodology establishes a robust foundation for building high-accuracy, deployable systems for automated hematological diagnosis.

Paper Structure

This paper contains 23 sections, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Loss Convergence Comparison. A plot showing the averaged training and validation loss over 25 epochs for both Regimen 1 (Baseline) and Regimen 2 (Transfer Learning). The figure demonstrates the rapid convergence and superior stability achieved by the Transfer Learning Regimen.
  • Figure 2: Qualitative Detection Results: Actual vs. Predicted. Detection outputs from the Transfer Learning model (Regimen 2) across datasets, demonstrating accurate localization and classification. (A) PBC Dataset, (B) BCCD Dataset, (C) Sickle Cell Sample.
  • Figure 3: Qualitative Failure Case Analysis (Regimen 2). Examples of common prediction errors. Visualization for Ground Truth in Green and Model Predictions in Red.