Table of Contents
Fetching ...

Synergizing Deep Learning and Biological Heuristics for Extreme Long-Tail White Blood Cell Classification

Duc T. Nguyen, Hoang-Long Nguyen, Huy-Hieu Pham

Abstract

Automated white blood cell (WBC) classification is essential for leukemia screening but remains challenged by extreme class imbalance, long-tail distributions, and domain shift, leading deep models to overfit dominant classes and fail on rare subtypes. We propose a hybrid framework for rare-class generalization that integrates a generative Pix2Pix-based restoration module for artifact removal, a Swin Transformer ensemble with MedSigLIP contrastive embeddings for robust representation learning, and a biologically-inspired refinement step using geometric spikiness and Mahalanobis-based morphological constraints to recover out-of-distribution predictions. Evaluated on the WBCBench 2026 challenge, our method achieves a Macro-F1 of 0.77139 on the private leaderboard, demonstrating strong performance under severe imbalance and highlighting the value of incorporating biological priors into deep learning for hematological image analysis.

Synergizing Deep Learning and Biological Heuristics for Extreme Long-Tail White Blood Cell Classification

Abstract

Automated white blood cell (WBC) classification is essential for leukemia screening but remains challenged by extreme class imbalance, long-tail distributions, and domain shift, leading deep models to overfit dominant classes and fail on rare subtypes. We propose a hybrid framework for rare-class generalization that integrates a generative Pix2Pix-based restoration module for artifact removal, a Swin Transformer ensemble with MedSigLIP contrastive embeddings for robust representation learning, and a biologically-inspired refinement step using geometric spikiness and Mahalanobis-based morphological constraints to recover out-of-distribution predictions. Evaluated on the WBCBench 2026 challenge, our method achieves a Macro-F1 of 0.77139 on the private leaderboard, demonstrating strong performance under severe imbalance and highlighting the value of incorporating biological priors into deep learning for hematological image analysis.
Paper Structure (13 sections, 2 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Class distribution of the WBCBench 2026 dataset.
  • Figure 2: Overview of the proposed three-stage hybrid framework. The pipeline seamlessly integrates generative domain restoration (Stage 1), dual-branch semantic feature extraction (Stage 2), and biological filtering (Stage 3) to achieve robust classification under extreme long-tail distributions.