Table of Contents
Fetching ...

Ensemble of Small Classifiers For Imbalanced White Blood Cell Classification

Siddharth Srivastava, Adam Smith, Scott Brooks, Jack Bacon, Till Bretschneider

Abstract

Automating white blood cell classification for diagnosis of leukaemia is a promising alternative to time-consuming and resource-intensive examination of cells by expert pathologists. However, designing robust algorithms for classification of rare cell types remains challenging due to variations in staining, scanning and inter-patient heterogeneity. We propose a lightweight ensemble approach for classification of cells during Haematopoiesis, with a focus on the biology of Granulopoiesis, Monocytopoiesis and Lymphopoiesis. Through dataset expansion to alleviate some class imbalance, we demonstrate that a simple ensemble of lightweight pretrained SwinV2-Tiny, DinoBloom-Small and ConvNeXT-V2-Tiny models achieves excellent performance on this challenging dataset. We train 3 instantiations of each architecture in a stratified 3-fold cross-validation framework; for an input image, we forward-pass through all 9 models and aggregate through logit averaging. We further reason on the weaknesses of our model in confusing similar-looking myelocytes in granulopoiesis and lymphocytes in lymphopoiesis. Code: https://gitlab.com/siddharthsrivastava/wbc-bench-2026.

Ensemble of Small Classifiers For Imbalanced White Blood Cell Classification

Abstract

Automating white blood cell classification for diagnosis of leukaemia is a promising alternative to time-consuming and resource-intensive examination of cells by expert pathologists. However, designing robust algorithms for classification of rare cell types remains challenging due to variations in staining, scanning and inter-patient heterogeneity. We propose a lightweight ensemble approach for classification of cells during Haematopoiesis, with a focus on the biology of Granulopoiesis, Monocytopoiesis and Lymphopoiesis. Through dataset expansion to alleviate some class imbalance, we demonstrate that a simple ensemble of lightweight pretrained SwinV2-Tiny, DinoBloom-Small and ConvNeXT-V2-Tiny models achieves excellent performance on this challenging dataset. We train 3 instantiations of each architecture in a stratified 3-fold cross-validation framework; for an input image, we forward-pass through all 9 models and aggregate through logit averaging. We further reason on the weaknesses of our model in confusing similar-looking myelocytes in granulopoiesis and lymphocytes in lymphopoiesis. Code: https://gitlab.com/siddharthsrivastava/wbc-bench-2026.
Paper Structure (11 sections, 1 equation, 4 figures, 1 table)

This paper contains 11 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Example phase 1 (unperturbed) images for each class from the WBCBench competition dataset.
  • Figure 2: Number of samples per class in our expanded dataset.
  • Figure 3: Model architecture. Left: our model consists of a $3$-ensemble of $3$ different architectures: a ConvNeXT-V2 Tiny 10205236, a SwinV2-Tiny Liu_2022_CVPR, and a DinoBloom-Small Koc_DinoBloom_MICCAI2024 model. In each, we replace the pretrained classification head with a simple MLP $h$. Each instantiation of an architecture is fined-tuned independently on $2$ out of $3$ folds of data and validated on the last fold to mitigate overfitting, giving 3 models for each architecture and a total of $9$ models. The final model $\mathcal{M}$ passes an input image through all $9$ architectures and averages the logits to produce the $13$-class logit vector. Right: At inference, we generate multiple augmented views of each image via random flips and rotations. These views are independently processed by $\mathcal{M}$ and their logits are averaged to obtain the final class prediction.
  • Figure 4: Confusion matrices of our ensemble trained on the expanded dataset. Left: confusion matrix of $1149$ high-confidence misclassified examples, $841$ of which come from WBCBench and the rest from Acevedo20; found using confident learning northcutt2021confidentlearning. The dotted red box groups the classes in granulopoiesis and the green box captures lymphopoiesis; MO occurs in monocytopoiesis. Right confusion matrix generated using out-of-fold examples for each group of models.